Update 101 and add note on pipeline order and tensors

This commit is contained in:
ines 2017-05-29 11:45:32 +02:00
parent 17b635eaab
commit a2134951f2
2 changed files with 14 additions and 1 deletions

View File

@ -42,7 +42,7 @@
},
"spacy-101": {
"title": "spaCy 101",
"title": "spaCy 101 Everything you need to know",
"next": "lightning-tour",
"quickstart": true
},

View File

@ -63,3 +63,16 @@ p
+code(false, "json").
"pipeline": ["tensorizer", "tagger", "parser", "ner"]
p
| Although you can mix and match pipeline components, their
| #[strong order and combination] is usually important. Some components may
| require certain modifications on the #[code Doc] to process it. For
| example, the default pipeline first applies the tensorizer, which
| pre-processes the doc and encodes its internal
| #[strong meaning representations] as an array of floats, also called a
| #[strong tensor]. This includes the tokens and their context, which is
| required for the next component, the tagger, to make predictions of the
| part-of-speech tags. Because spaCy's models are neural network models,
| they only "speak" tensors and expect the input #[code Doc] to have
| a #[code tensor].