Add serialization 101

2017-05-24 19:24:40 +02:00 · 2017-05-24 19:24:40 +02:00 · 54885b5e88
parent b546bcb05f
commit 54885b5e88
3 changed files with 49 additions and 0 deletions
--- a/website/docs/usage/_spacy-101/_serialization.jade
+++ b/website/docs/usage/_spacy-101/_serialization.jade
@ -0,0 +1,35 @@
 //- 💫 DOCS > USAGE > SPACY 101 > SERIALIZATION
 p
    |  If you've been modifying the pipeline, vocabulary vectors and entities, or made
    |  updates to the model, you'll eventually want
    |  to #[strong save your progress] – for example, everything that's in your #[code nlp]
    |  object. This means you'll have to translate its contents and structure
    |  into a format that can be saved, like a file or a byte string. This
    |  process is called serialization. spaCy comes with
    |  #[strong built-in serialization methods] and supports the
    |  #[+a("http://www.diveintopython3.net/serializing.html#dump") Pickle protocol].
 +aside("What's pickle?")
    |  Pickle is Python's built-in object persistance system. It lets you
    |  transfer arbitrary Python objects between processes. This is usually used
    |  to load an object to and from disk, but it's also used for distributed
    |  computing, e.g. with
    |  #[+a("https://spark.apache.org/docs/0.9.0/python-programming-guide.html") PySpark]
    |  or #[+a("http://dask.pydata.org/en/latest/") Dask]. When you unpickle an
    |  object, you're agreeing to execute whatever code it contains. It's like
    |  calling #[code eval()] on a string – so don't unpickle objects from
    |  untrusted sources.
 p
    |  All container classes and pipeline components, i.e.
    for cls in ["Doc", "Language", "Tokenizer", "Tagger", "DependencyParser", "EntityRecognizer", "Vocab", "StringStore"]
        |  #[+api(cls.toLowerCase()) #[code=cls]],
    |  have the following methods available:
 +table(["Method", "Returns", "Example"])
    - style = [1, 0, 1]
    +annotation-row(["to_bytes", "bytes", "nlp.to_bytes()"], style)
    +annotation-row(["from_bytes", "object", "nlp.from_bytes(bytes)"], style)
    +annotation-row(["to_disk", "-", "nlp.to_disk('/path')"], style)
    +annotation-row(["from_disk", "object", "nlp.from_disk('/path')"], style)
--- a/website/docs/usage/saving-loading.jade
+++ b/website/docs/usage/saving-loading.jade
@ -1,5 +1,15 @@
 include ../../_includes/_mixins
 +h(2, "101") Serialization 101
 include _spacy-101/_serialization
 +infobox("Important note")
    |  In spaCy v2.0, the API for saving and loading has changed to only use the
    |  four methods listed above consistently across objects and classes. For an
    |  overview of the changes, see #[+a("/docs/usage/v2#incompat") this table]
    |  and the notes on #[+a("/docs/usage/v2#migrating-saving-loading") migrating].
 +h(2, "models") Saving models
--- a/website/docs/usage/spacy-101.jade
+++ b/website/docs/usage/spacy-101.jade
@ -105,6 +105,10 @@ include _spacy-101/_word-vectors
 +h(2, "pipelines") Pipelines
 +h(2, "serialization") Serialization
 include _spacy-101/_serialization
 +h(2, "architecture") Architecture
 +image