Add section on special tokenizer component [ci skip]

2019-07-25 14:25:03 +02:00 · 2019-07-25 14:25:03 +02:00 · 02e444ec7c
parent 1fa6d6ba55
commit 02e444ec7c
1 changed files with 14 additions and 0 deletions
--- a/website/docs/usage/101/_pipelines.md
+++ b/website/docs/usage/101/_pipelines.md
@ -52,4 +52,18 @@ entities into account when making predictions.
 </Accordion>
 <Accordion title="Why is the tokenizer special?" id="pipeline-components-tokenizer">
 The tokenizer is a "special" component and isn't part of the regular pipeline.
 It also doesn't show up in `nlp.pipe_names`. The reason is that there can only
 really be one tokenizer, and while all other pipeline components take a `Doc`
 and return it, the tokenizer takes a **string of text** and turns it into a
 `Doc`. You can still customize the tokenizer, though. `nlp.tokenizer` is
 writable, so you can either create your own
 [`Tokenizer` class from scratch](/usage/linguistic-features#native-tokenizers),
 or even replace it with an
 [entirely custom function](/usage/linguistic-features#custom-tokenizer).
 </Accordion>
 ---