mirror of https://github.com/explosion/spaCy.git
36 lines
1.9 KiB
Plaintext
36 lines
1.9 KiB
Plaintext
|
//- 💫 DOCS > USAGE > SPACY 101 > SERIALIZATION
|
|||
|
|
|||
|
p
|
|||
|
| If you've been modifying the pipeline, vocabulary vectors and entities, or made
|
|||
|
| updates to the model, you'll eventually want
|
|||
|
| to #[strong save your progress] – for example, everything that's in your #[code nlp]
|
|||
|
| object. This means you'll have to translate its contents and structure
|
|||
|
| into a format that can be saved, like a file or a byte string. This
|
|||
|
| process is called serialization. spaCy comes with
|
|||
|
| #[strong built-in serialization methods] and supports the
|
|||
|
| #[+a("http://www.diveintopython3.net/serializing.html#dump") Pickle protocol].
|
|||
|
|
|||
|
+aside("What's pickle?")
|
|||
|
| Pickle is Python's built-in object persistance system. It lets you
|
|||
|
| transfer arbitrary Python objects between processes. This is usually used
|
|||
|
| to load an object to and from disk, but it's also used for distributed
|
|||
|
| computing, e.g. with
|
|||
|
| #[+a("https://spark.apache.org/docs/0.9.0/python-programming-guide.html") PySpark]
|
|||
|
| or #[+a("http://dask.pydata.org/en/latest/") Dask]. When you unpickle an
|
|||
|
| object, you're agreeing to execute whatever code it contains. It's like
|
|||
|
| calling #[code eval()] on a string – so don't unpickle objects from
|
|||
|
| untrusted sources.
|
|||
|
|
|||
|
p
|
|||
|
| All container classes and pipeline components, i.e.
|
|||
|
for cls in ["Doc", "Language", "Tokenizer", "Tagger", "DependencyParser", "EntityRecognizer", "Vocab", "StringStore"]
|
|||
|
| #[+api(cls.toLowerCase()) #[code=cls]],
|
|||
|
| have the following methods available:
|
|||
|
|
|||
|
+table(["Method", "Returns", "Example"])
|
|||
|
- style = [1, 0, 1]
|
|||
|
+annotation-row(["to_bytes", "bytes", "nlp.to_bytes()"], style)
|
|||
|
+annotation-row(["from_bytes", "object", "nlp.from_bytes(bytes)"], style)
|
|||
|
+annotation-row(["to_disk", "-", "nlp.to_disk('/path')"], style)
|
|||
|
+annotation-row(["from_disk", "object", "nlp.from_disk('/path')"], style)
|