From 559b65f2e08ca3d4ed3c04bc0c58e241aef2b1a6 Mon Sep 17 00:00:00 2001
From: svlandeg <sofie.vanlandeghem@gmail.com>
Date: Thu, 27 Aug 2020 09:43:32 +0200
Subject: [PATCH] adjust references to null_annotation_setter to trfdata_setter

---
 website/docs/api/transformer.md               | 59 ++++++++++---------
 website/docs/usage/embeddings-transformers.md |  6 +-
 2 files changed, 34 insertions(+), 31 deletions(-)
diff --git a/website/docs/api/transformer.md b/website/docs/api/transformer.md
index c32651e02..0b51487ed 100644
--- a/website/docs/api/transformer.md
+++ b/website/docs/api/transformer.md
@@ -25,24 +25,23 @@ work out-of-the-box.
 
 </Infobox>
 
-This pipeline component lets you use transformer models in your pipeline.
-Supports all models that are available via the
+This pipeline component lets you use transformer models in your pipeline. It
+supports all models that are available via the
 [HuggingFace `transformers`](https://huggingface.co/transformers) library.
 Usually you will connect subsequent components to the shared transformer using
 the [TransformerListener](/api/architectures#TransformerListener) layer. This
 works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
 [Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
 
-The component assigns the output of the transformer to the `Doc`'s extension
-attributes. We also calculate an alignment between the word-piece tokens and the
-spaCy tokenization, so that we can use the last hidden states to set the
-`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
-token, the spaCy token receives the sum of their values. To access the values,
-you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The
-package also adds the function registries [`@span_getters`](#span_getters) and
-[`@annotation_setters`](#annotation_setters) with several built-in registered
-functions. For more details, see the
-[usage documentation](/usage/embeddings-transformers).
+We calculate an alignment between the word-piece tokens and the spaCy
+tokenization, so that we can use the last hidden states to store the information
+on the `Doc`. When multiple word-piece tokens align to the same spaCy token, the
+spaCy token receives the sum of their values. By default, the information is
+written to the [`Doc._.trf_data`](#custom-attributes) extension attribute, but
+you can implement a custom [`@annotation_setter`](#annotation_setters) to change
+this behaviour. The package also adds the function registry
+[`@span_getters`](#span_getters) with several built-in registered functions. For
+more details, see the [usage documentation](/usage/embeddings-transformers).
 
 ## Config and implementation {#config}
 
@@ -61,11 +60,11 @@ architectures and their arguments and hyperparameters.
 > nlp.add_pipe("transformer", config=DEFAULT_CONFIG)
 > ```
 
-| Setting             | Description                                                                                                                                                                                                                                                                                                            |
-| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `max_batch_items`   | Maximum size of a padded batch. Defaults to `4096`. ~~int~~                                                                                                                                                                                                                                                            |
-| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. Defaults to `null_annotation_setter` (no additional annotations). ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
-| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~                                                                                                                         |
+| Setting             | Description                                                                                                                                                                                                                                       |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `max_batch_items`   | Maximum size of a padded batch. Defaults to `4096`. ~~int~~                                                                                                                                                                                       |
+| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.transformer_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
+| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~                                                    |
 
 ```python
 https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
@@ -518,19 +517,23 @@ right context.
 
 ## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}
 
-Annotation setters are functions that that take a batch of `Doc` objects and a
-[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set
-additional annotations on the `Doc`, e.g. to set custom or built-in attributes.
-You can register custom annotation setters using the
-`@registry.annotation_setters` decorator.
+Annotation setters are functions that take a batch of `Doc` objects and a
+[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and store the
+annotations on the `Doc`, e.g. to set custom or built-in attributes. You can
+register custom annotation setters using the `@registry.annotation_setters`
+decorator. The default annotation setter used by the `Transformer` pipeline
+component is `trfdata_setter`, which sets the custom `Doc._.transformer_data`
+attribute.
 
 > #### Example
 >
 > ```python
-> @registry.annotation_setters("spacy-transformers.null_annotation_setter.v1")
-> def configure_null_annotation_setter() -> Callable:
+> @registry.annotation_setters("spacy-transformers.trfdata_setter.v1")
+> def configure_trfdata_setter() -> Callable:
 >     def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
->         pass
+>         doc_data = list(trf_data.doc_data)
+>         for doc, data in zip(docs, doc_data):
+>             doc._.trf_data = data
 >
 >     return setter
 > ```
@@ -542,9 +545,9 @@ You can register custom annotation setters using the
 
 The following built-in functions are available:
 
-| Name                                           | Description                           |
-| ---------------------------------------------- | ------------------------------------- |
-| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
+| Name                                   | Description                                                   |
+| -------------------------------------- | ------------------------------------------------------------- |
+| `spacy-transformers.trfdata_setter.v1` | Set the annotations to the custom attribute `doc._.trf_data`. |
 
 ## Custom attributes {#custom-attributes}
 
diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md
index 62336a826..fbae1da82 100644
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@@ -299,7 +299,7 @@ component:
 >
 > ```python
 > from spacy_transformers import Transformer, TransformerModel
-> from spacy_transformers.annotation_setters import null_annotation_setter
+> from spacy_transformers.annotation_setters import configure_trfdata_setter
 > from spacy_transformers.span_getters import get_doc_spans
 >
 > trf = Transformer(
@@ -309,7 +309,7 @@ component:
 >         get_spans=get_doc_spans,
 >         tokenizer_config={"use_fast": True},
 >     ),
->     annotation_setter=null_annotation_setter,
+>     annotation_setter=configure_trfdata_setter(),
 >     max_batch_items=4096,
 > )
 > ```
@@ -329,7 +329,7 @@ tokenizer_config = {"use_fast": true}
 @span_getters = "doc_spans.v1"
 
 [components.transformer.annotation_setter]
-@annotation_setters = "spacy-transformers.null_annotation_setter.v1"
+@annotation_setters = "spacy-transformers.trfdata_setter.v1"
 
 ```