Dev docs: listeners (#9061)

* Start Listeners documentation * intro tabel of different architectures * initialization, linking, dim inference * internal comm (WIP) * expand internal comm section * frozen components and replacing listeners * various small fixes * fix content table * fix link
2021-08-30 14:56:35 +02:00 · 2021-08-30 14:56:35 +02:00 · 5af88427a2
parent 1e9b4b55ee
commit 5af88427a2
1 changed files with 220 additions and 0 deletions
--- a/extra/DEVELOPER_DOCS/Listeners.md
+++ b/extra/DEVELOPER_DOCS/Listeners.md
@ -0,0 +1,220 @@
 # Listeners
 1. [Overview](#1-overview)
 2. [Initialization](#2-initialization)
   - [A. Linking listeners to the embedding component](#2a-linking-listeners-to-the-embedding-component)
   - [B. Shape inference](#2b-shape-inference)
 3. [Internal communication](#3-internal-communication)
   - [A. During prediction](#3a-during-prediction)
   - [B. During training](#3b-during-training)
   - [C. Frozen components](#3c-frozen-components)
 4. [Replacing listener with standalone](#4-replacing-listener-with-standalone)
 ## 1. Overview
 Trainable spaCy components typically use some sort of `tok2vec` layer as part of the `model` definition.
 This `tok2vec` layer produces embeddings and is either a standard `Tok2Vec` layer, or a Transformer-based one.
 Both versions can be used either inline/standalone, which means that they are defined and used
 by only one specific component (e.g. NER), or
 [shared](https://spacy.io/usage/embeddings-transformers#embedding-layers),
 in which case the embedding functionality becomes a separate component that can
 feed embeddings to multiple components downstream, using a listener-pattern.
 | Type          | Usage      | Model Architecture                                                                                 |
 | ------------- | ---------- | -------------------------------------------------------------------------------------------------- |
 | `Tok2Vec`     | standalone | [`spacy.Tok2Vec`](https://spacy.io/api/architectures#Tok2Vec)                                      |
 | `Tok2Vec`     | listener   | [`spacy.Tok2VecListener`](https://spacy.io/api/architectures#Tok2VecListener)                      |
 | `Transformer` | standalone | [`spacy-transformers.Tok2VecTransformer`](https://spacy.io/api/architectures#Tok2VecTransformer)   |
 | `Transformer` | listener   | [`spacy-transformers.TransformerListener`](https://spacy.io/api/architectures#TransformerListener) |
 Here we discuss the listener pattern and its implementation in code in more detail.
 ## 2. Initialization
 ### 2A. Linking listeners to the embedding component
 To allow sharing a `tok2vec` layer, a separate `tok2vec` component needs to be defined in the config:
 ```
 [components.tok2vec]
 factory = "tok2vec"
 [components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"
 ```
 A listener can then be set up by making sure the correct `upstream` name is defined, referring to the
 name of the `tok2vec` component (which equals the factory name by default), or `*` as a wildcard:
 ```
 [components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
 upstream = "tok2vec"
 ```
 When an [`nlp`](https://github.com/explosion/spaCy/blob/master/extra/DEVELOPER_DOCS/Language.md) object is
 initialized or deserialized, it will make sure to link each `tok2vec` component to its listeners. This is
 implemented in the method `nlp._link_components()` which loops over each
 component in the pipeline and calls `find_listeners()` on a component if it's defined.
 The [`tok2vec` component](https://github.com/explosion/spaCy/blob/master/spacy/pipeline/tok2vec.py)'s implementation
 of this `find_listener()` method will specifically identify sublayers of a model definition that are of type
 `Tok2VecListener` with a matching upstream name and will then add that listener to the internal `self.listener_map`.
 If it's a Transformer-based pipeline, a
 [`transformer` component](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py)
 has a similar implementation but its `find_listener()` function will specifically look for `TransformerListener` 
 sublayers of downstream components.
 ### 2B. Shape inference
 Typically, the output dimension `nO` of a listener's model equals the `nO` (or `width`) of the upstream embedding layer.
 For a standard `Tok2Vec`-based component, this is typically known up-front and defined as such in the config:
 ```
 [components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
 width = ${components.tok2vec.model.encode.width}
 ```
 A `transformer` component however only knows its `nO` dimension after the HuggingFace transformer
 is set with the function `model.attrs["set_transformer"]`,
 [implemented](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/transformer_model.py)
 by `set_pytorch_transformer`.
 This is why, upon linking of the transformer listeners, the `transformer` component also makes sure to set
 the listener's output dimension correctly.
 This shape inference mechanism also needs to happen with resumed/frozen components, which means that for some CLI
 commands (`assemble` and `train`), we need to call `nlp._link_components` even before initializing the `nlp`
 object. To cover all use-cases and avoid negative side effects, the code base ensures that performing the
 linking twice is not harmful.
 ## 3. Internal communication
 The internal communication between a listener and its downstream components is organized by sending and
 receiving information across the components - either directly or implicitly.
 The details are different depending on whether the pipeline is currently training, or predicting.
 Either way, the `tok2vec` or `transformer` component always needs to run before the listener.
 ### 3A. During prediction
 When the `Tok2Vec` pipeline component is called, its `predict()` method is executed to produce the results,
 which are then stored by `set_annotations()` in the `doc.tensor` field of the document(s).
 Similarly, the `Transformer` component stores the produced embeddings
 in `doc._.trf_data`. Next, the `forward` pass of a
 [`Tok2VecListener`](https://github.com/explosion/spaCy/blob/master/spacy/pipeline/tok2vec.py)
 or a
 [`TransformerListener`](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/listener.py)
 accesses these fields on the `Doc` directly. Both listener implementations have a fallback mechanism for when these
 properties were not set on the `Doc`: in that case an all-zero tensor is produced and returned.
 We need this fallback mechanism to enable shape inference methods in Thinc, but the code
 is slightly risky and at times might hide another bug - so it's a good spot to be aware of.
 ### 3B. During training
 During training, the `update()` methods of the `Tok2Vec` & `Transformer` components don't necessarily set the
 annotations on the `Doc` (though since 3.1 they can if they are part of the `annotating_components` list in the config).
 Instead, we rely on a caching mechanism between the original embedding component and its listener.
 Specifically, the produced embeddings are sent to the listeners by calling `listener.receive()` and uniquely
 identifying the batch of documents with a `batch_id`. This `receive()` call also sends the appropriate `backprop`
 call to ensure that gradients from the downstream component flow back to the trainable `Tok2Vec` or `Transformer`
 network.
 We rely on the `nlp` object properly batching the data and sending each batch through the pipeline in sequence,
 which means that only one such batch needs to be kept in memory for each listener.
 When the downstream component runs and the listener should produce embeddings, it accesses the batch in memory,
 runs the backpropagation, and returns the results and the gradients.
 There are two ways in which this mechanism can fail, both are detected by `verify_inputs()`:
 - `E953` if a different batch is in memory than the requested one - signaling some kind of out-of-sync state of the
  training pipeline.
 - `E954` if no batch is in memory at all - signaling that the pipeline is probably not set up correctly.
 #### Training with multiple listeners
 One `Tok2Vec` or `Transformer` component may be listened to by several downstream components, e.g.
 a tagger and a parser could be sharing the same embeddings. In this case, we need to be careful about how we do
 the backpropagation. When the `Tok2Vec` or `Transformer` sends out data to the listener with `receive()`, they will
 send an `accumulate_gradient` function call to all listeners, except the last one. This function will keep track
 of the gradients received so far. Only the final listener in the pipeline will get an actual `backprop` call that
 will initiate the backpropagation of the `tok2vec` or `transformer` model with the accumulated gradients.
 ### 3C. Frozen components
 The listener pattern can get particularly tricky in combination with frozen components. To detect components
 with listeners that are not frozen consistently, `init_nlp()` (which is called by `spacy train`) goes through
 the listeners and their upstream components and warns in two scenarios.
 #### The Tok2Vec or Transformer is frozen
 If the `Tok2Vec` or `Transformer` was already trained,
 e.g. by [pretraining](https://spacy.io/usage/embeddings-transformers#pretraining),
 it could be a valid use-case to freeze the embedding architecture and only train downstream components such
 as a tagger or a parser. This used to be impossible before 3.1, but has become supported since then by putting the
 embedding component in the [`annotating_components`](https://spacy.io/usage/training#annotating-components)
 list of the config. This works like any other "annotating component" because it relies on the `Doc` attributes.
 However, if the `Tok2Vec` or `Transformer` is frozen, and not present in `annotating_components`, and a related 
 listener isn't frozen, then a `W086` warning is shown and further training of the pipeline will likely end with `E954`.
 #### The upstream component is frozen
 If an upstream component is frozen but the underlying `Tok2Vec` or `Transformer` isn't, the performance of
 the upstream component will be degraded after training. In this case, a `W087` warning is shown, explaining
 how to use the `replace_listeners` functionality to prevent this problem.
 ## 4. Replacing listener with standalone
 The [`replace_listeners`](https://spacy.io/api/language#replace_listeners) functionality changes the architecture
 of a downstream component from using a listener pattern to a standalone `tok2vec` or `transformer` layer,
 effectively making the downstream component independent of any other components in the pipeline.
 It is implemented by `nlp.replace_listeners()` and typically executed by `nlp.from_config()`.
 First, it fetches the original `Model` of the original component that creates the embeddings:
 ```
 tok2vec = self.get_pipe(tok2vec_name)
 tok2vec_model = tok2vec.model
 ```
 Which is either a [`Tok2Vec` model](https://github.com/explosion/spaCy/blob/master/spacy/ml/models/tok2vec.py) or a
 [`TransformerModel`](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/transformer_model.py).
 In the case of the `tok2vec`, this model can be copied as-is into the configuration and architecture of the
 downstream component. However, for the `transformer`, this doesn't work.
 The reason is that the `TransformerListener` architecture chains the listener with
 [`trfs2arrays`](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/trfs2arrays.py):
 ```
 model = chain(
    TransformerListener(upstream_name=upstream)
    trfs2arrays(pooling, grad_factor),
 )
 ```
 but the standalone `Tok2VecTransformer` has an additional `split_trf_batch` chained inbetween the model
 and `trfs2arrays`:
 ```
 model = chain(
    TransformerModel(name, get_spans, tokenizer_config),
    split_trf_batch(),
    trfs2arrays(pooling, grad_factor),
 )
 ```
 So you can't just take the model from the listener, and drop that into the component internally. You need to
 adjust the model and the config. To facilitate this, `nlp.replace_listeners()` will check whether additional
 [functions](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/_util.py) are
 [defined](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/transformer_model.py)
 in `model.attrs`, and if so, it will essentially call these to make the appropriate changes:
 ```
 replace_func = tok2vec_model.attrs["replace_listener_cfg"]
 new_config = replace_func(tok2vec_cfg["model"], pipe_cfg["model"]["tok2vec"])
 ...
 new_model = tok2vec_model.attrs["replace_listener"](new_model)
 ```
 The new config and model are then properly stored on the `nlp` object.
 Note that this functionality (running the replacement for a transformer listener) was broken prior to 
 `spacy-transformers` 1.0.5.