Update 101 and add note on pipeline order and tensors

2017-05-29 11:45:32 +02:00 · 2017-05-29 11:45:32 +02:00 · a2134951f2
parent 17b635eaab
commit a2134951f2
2 changed files with 14 additions and 1 deletions
--- a/website/docs/usage/_data.json
+++ b/website/docs/usage/_data.json
@ -42,7 +42,7 @@
    },

    "spacy-101": {
-        "title": "spaCy 101",
+        "title": "spaCy 101 – Everything you need to know",
        "next": "lightning-tour",
        "quickstart": true
    },
--- a/website/docs/usage/_spacy-101/_pipelines.jade
+++ b/website/docs/usage/_spacy-101/_pipelines.jade
@ -63,3 +63,16 @@ p

 +code(false, "json").
    "pipeline": ["tensorizer", "tagger", "parser", "ner"]
+
+p
+    |  Although you can mix and match pipeline components, their
+    |  #[strong order and combination] is usually important. Some components may
+    |  require certain modifications on the #[code Doc] to process it. For
+    |  example, the default pipeline first applies the tensorizer, which
+    |  pre-processes the doc and encodes its internal
+    |  #[strong meaning representations] as an array of floats, also called a
+    |  #[strong tensor]. This includes the tokens and their context, which is
+    |  required for the next component, the tagger, to make predictions of the
+    |  part-of-speech tags. Because spaCy's models are neural network models,
+    |  they only "speak" tensors and expect the input #[code Doc] to have
+    |  a #[code tensor].