Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-08-22 17:15:05 +02:00
parent 8dfc4cbfe7
commit 98a9e063b6
2 changed files with 26 additions and 12 deletions

View File

@ -71,10 +71,10 @@ of performance.
## Shared embedding layers {#embedding-layers} ## Shared embedding layers {#embedding-layers}
spaCy lets you share a single transformer or other token-to-vector ("tok2vec") spaCy lets you share a single transformer or other token-to-vector ("tok2vec")
embedding layer between multiple components. You can even update the shared layer, embedding layer between multiple components. You can even update the shared
performing **multi-task learning**. Reusing the tok2vec layer between components layer, performing **multi-task learning**. Reusing the tok2vec layer between
can make your pipeline run a lot faster and result in much components can make your pipeline run a lot faster and result in much smaller
smaller models. However, it can make the pipeline less modular and make it more models. However, it can make the pipeline less modular and make it more
difficult to swap components or retrain parts of the pipeline. Multi-task difficult to swap components or retrain parts of the pipeline. Multi-task
learning can affect your accuracy (either positively or negatively), and may learning can affect your accuracy (either positively or negatively), and may
require some retuning of your hyper-parameters. require some retuning of your hyper-parameters.
@ -87,11 +87,11 @@ require some retuning of your hyper-parameters.
| ✅ **faster:** embed the documents once for your whole pipeline | ❌ **slower:** rerun the embedding for each component | | ✅ **faster:** embed the documents once for your whole pipeline | ❌ **slower:** rerun the embedding for each component |
| ❌ **less composable:** all components require the same embedding component in the pipeline | ✅ **modular:** components can be moved and swapped freely | | ❌ **less composable:** all components require the same embedding component in the pipeline | ✅ **modular:** components can be moved and swapped freely |
You can share a single transformer or other tok2vec model between multiple components You can share a single transformer or other tok2vec model between multiple
by adding a [`Transformer`](/api/transformer) or [`Tok2Vec`](/api/tok2vec) component components by adding a [`Transformer`](/api/transformer) or
near the start of your pipeline. Components later in the pipeline can "connect" [`Tok2Vec`](/api/tok2vec) component near the start of your pipeline. Components
to it by including a **listener layer** like [Tok2VecListener](/api/architectures#Tok2VecListener) later in the pipeline can "connect" to it by including a **listener layer** like
within their model. [Tok2VecListener](/api/architectures#Tok2VecListener) within their model.
![Pipeline components listening to shared embedding component](../images/tok2vec-listener.svg) ![Pipeline components listening to shared embedding component](../images/tok2vec-listener.svg)
@ -103,8 +103,9 @@ eventually called. A similar mechanism is used to pass gradients from the
listeners back to the model. The [`Transformer`](/api/transformer) component and listeners back to the model. The [`Transformer`](/api/transformer) component and
[TransformerListener](/api/architectures#TransformerListener) layer do the same [TransformerListener](/api/architectures#TransformerListener) layer do the same
thing for transformer models, but the `Transformer` component will also save the thing for transformer models, but the `Transformer` component will also save the
transformer outputs to the `doc._.trf_data` extension attribute, giving you transformer outputs to the
access to them after the pipeline has finished running. [`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
giving you access to them after the pipeline has finished running.
<!-- TODO: show example of implementation via config, side by side --> <!-- TODO: show example of implementation via config, side by side -->

View File

@ -170,3 +170,16 @@ to the host device unnecessarily.
- Interaction with `predict`, `get_loss` and `set_annotations` - Interaction with `predict`, `get_loss` and `set_annotations`
- Initialization life-cycle with `begin_training`. - Initialization life-cycle with `begin_training`.
- Link to relation extraction notebook. - Link to relation extraction notebook.
```python
def update(self, examples):
docs = [ex.predicted for ex in examples]
refs = [ex.reference for ex in examples]
predictions, backprop = self.model.begin_update(docs)
gradient = self.get_loss(predictions, refs)
backprop(gradient)
def __call__(self, doc):
predictions = self.model([doc])
self.set_annotations(predictions)
```