diff --git a/website/docs/usage/saving-loading.md b/website/docs/usage/saving-loading.md index b7a8476a8..55755246f 100644 --- a/website/docs/usage/saving-loading.md +++ b/website/docs/usage/saving-loading.md @@ -22,6 +22,43 @@ the changes, see [this table](/usage/v2#incompat) and the notes on +### Serializing the pipeline + +When serializing the pipeline, keep in mind that this will only save out the +**binary data for the individual components** to allow spaCy to restore them – +not the entire objects. This is a good thing, because it makes serialization +safe. But it also means that you have to take care of storing the language name +and pipeline component names as well, and restoring them separately before you +can load in the data. + +> #### Saving the model meta +> +> The `nlp.meta` attribute is a JSON-serializable dictionary and contains all +> model meta information, like the language and pipeline, but also author and +> license information. + +```python +### Serialize +bytes_data = nlp.to_bytes() +lang = nlp.meta["lang"] # "en" +pipeline = nlp.meta["pipeline"] # ["tagger", "parser", "ner"] +``` + +```python +### Deserialize +nlp = spacy.blank(lang) +for pipe_name in pipeline: + pipe = nlp.create_pipe(pipe_name) + nlp.add_pipe(pipe) +nlp.from_bytes(bytes_data) +``` + +This is also how spaCy does it under the hood when loading a model: it loads the +model's `meta.json` containing the language and pipeline information, +initializes the language class, creates and adds the pipeline components and +_then_ loads in the binary data. You can read more about this process +[here](/usage/processing-pipelines#pipelines). + ### Using Pickle {#pickle} > #### Example