From e65dffd80b9bbed1897855342cf431f8a1e66ff0 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Sat, 5 Oct 2019 11:58:00 +0200 Subject: [PATCH] Clarify serialization of extension attributes (closes #4377) [ci skip] --- website/docs/api/docbin.md | 2 +- website/docs/usage/saving-loading.md | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/website/docs/api/docbin.md b/website/docs/api/docbin.md index a4525906e..41ebb6075 100644 --- a/website/docs/api/docbin.md +++ b/website/docs/api/docbin.md @@ -46,7 +46,7 @@ Create a `DocBin` object to hold serialized annotations. | Argument | Type | Description | | ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `attrs` | list | List of attributes to serialize. `orth` (hash of token text) and `spacy` (whether the token is followed by whitespace) are always serialized, so they're not required. Defaults to `None`. | -| `store_user_data` | bool | Whether to include the `Doc.user_data`. Defaults to `False`. | +| `store_user_data` | bool | Whether to include the `Doc.user_data` and the values of custom extension attributes. Defaults to `False`. | | **RETURNS** | `DocBin` | The newly constructed object. | ## DocBin.\_\len\_\_ {#len tag="method"} diff --git a/website/docs/usage/saving-loading.md b/website/docs/usage/saving-loading.md index c7578a8df..70983198f 100644 --- a/website/docs/usage/saving-loading.md +++ b/website/docs/usage/saving-loading.md @@ -92,6 +92,25 @@ doc_bin = DocBin().from_bytes(bytes_data) docs = list(doc_bin.get_docs(nlp.vocab)) ``` +If `store_user_data` is set to `True`, the `Doc.user_data` will be serialized as +well, which includes the values of +[extension attributes](/processing-pipelines#custom-components-attributes) (if +they're serializable with msgpack). + + + +Including the `Doc.user_data` and extension attributes will only serialize the +**values** of the attributes. To restore the values and access them via the +`doc._.` property, you need to register the global attribute on the `Doc` again. + +```python +docs = list(doc_bin.get_docs(nlp.vocab)) +Doc.set_extension("my_custom_attr", default=None) +print([doc._.my_custom_attr for doc in docs]) +``` + + + ### Using Pickle {#pickle} > #### Example