From 02e444ec7ca5ba32979e42990d0f75084d0ae679 Mon Sep 17 00:00:00 2001
From: Ines Montani <ines@ines.io>
Date: Thu, 25 Jul 2019 14:25:03 +0200
Subject: [PATCH] Add section on special tokenizer component [ci skip]

---
 website/docs/usage/101/_pipelines.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
diff --git a/website/docs/usage/101/_pipelines.md b/website/docs/usage/101/_pipelines.md
index 64c2f6c98..68308a381 100644
--- a/website/docs/usage/101/_pipelines.md
+++ b/website/docs/usage/101/_pipelines.md
@@ -52,4 +52,18 @@ entities into account when making predictions.
 
 </Accordion>
 
+<Accordion title="Why is the tokenizer special?" id="pipeline-components-tokenizer">
+
+The tokenizer is a "special" component and isn't part of the regular pipeline.
+It also doesn't show up in `nlp.pipe_names`. The reason is that there can only
+really be one tokenizer, and while all other pipeline components take a `Doc`
+and return it, the tokenizer takes a **string of text** and turns it into a
+`Doc`. You can still customize the tokenizer, though. `nlp.tokenizer` is
+writable, so you can either create your own
+[`Tokenizer` class from scratch](/usage/linguistic-features#native-tokenizers),
+or even replace it with an
+[entirely custom function](/usage/linguistic-features#custom-tokenizer).
+
+</Accordion>
+
 ---