diff --git a/website/docs/usage/v2-2.md b/website/docs/usage/v2-2.md index 8f339cb9b..b6bd308e0 100644 --- a/website/docs/usage/v2-2.md +++ b/website/docs/usage/v2-2.md @@ -118,20 +118,20 @@ classification. -### New DocPallet class to efficiently Doc collections +### New DocBin class to efficiently serialize Doc collections > #### Example > > ```python -> from spacy.tokens import DocPallet -> pallet = DocPallet(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False) +> from spacy.tokens import DocBin +> doc_bin = DocBin(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False) > for doc in nlp.pipe(texts): -> pallet.add(doc) -> byte_data = pallet.to_bytes() +> doc_bin.add(doc) +> byte_data = docbin.to_bytes() > # Deserialize later, e.g. in a new process > nlp = spacy.blank("en") -> pallet = DocPallet() -> docs = list(pallet.get_docs(nlp.vocab)) +> doc_bin = DocBin() +> docs = list(doc_bin.get_docs(nlp.vocab)) > ``` If you're working with lots of data, you'll probably need to pass analyses @@ -140,7 +140,7 @@ save out work to disk. Often it's sufficient to use the doc.to_array() functionality for this, and just serialize the numpy arrays --- but other times you want a more general way to save and restore `Doc` objects. -The new `DocPallet` class makes it easy to serialize and deserialize +The new `DocBin` class makes it easy to serialize and deserialize a collection of `Doc` objects together, and is much more efficient than calling `doc.to_bytes()` on each individual `Doc` object. You can also control what data gets saved, and you can merge pallets together for easy