Add docs on serializing the pipeline (see #3289) [ci skip]

2019-02-18 14:13:29 +01:00 · 2019-02-18 14:13:29 +01:00 · 57ae71ea95
parent 91f260f2c4
commit 57ae71ea95
1 changed files with 37 additions and 0 deletions
--- a/website/docs/usage/saving-loading.md
+++ b/website/docs/usage/saving-loading.md
@ -22,6 +22,43 @@ the changes, see [this table](/usage/v2#incompat) and the notes on

 </Infobox>

+### Serializing the pipeline
+
+When serializing the pipeline, keep in mind that this will only save out the
+**binary data for the individual components** to allow spaCy to restore them –
+not the entire objects. This is a good thing, because it makes serialization
+safe. But it also means that you have to take care of storing the language name
+and pipeline component names as well, and restoring them separately before you
+can load in the data.
+
+> #### Saving the model meta
+>
+> The `nlp.meta` attribute is a JSON-serializable dictionary and contains all
+> model meta information, like the language and pipeline, but also author and
+> license information.
+
+```python
+### Serialize
+bytes_data = nlp.to_bytes()
+lang = nlp.meta["lang"]  # "en"
+pipeline = nlp.meta["pipeline"]  # ["tagger", "parser", "ner"]
+```
+
+```python
+### Deserialize
+nlp = spacy.blank(lang)
+for pipe_name in pipeline:
+    pipe = nlp.create_pipe(pipe_name)
+    nlp.add_pipe(pipe)
+nlp.from_bytes(bytes_data)
+```
+
+This is also how spaCy does it under the hood when loading a model: it loads the
+model's `meta.json` containing the language and pipeline information,
+initializes the language class, creates and adds the pipeline components and
+_then_ loads in the binary data. You can read more about this process
+[here](/usage/processing-pipelines#pipelines).
+
 ### Using Pickle {#pickle}

 > #### Example