From 52b8c2d2e0241e1c515131c5e5f576d5dad65059 Mon Sep 17 00:00:00 2001 From: Paul O'Leary McCann Date: Mon, 22 Nov 2021 10:06:07 +0000 Subject: [PATCH] Add note on batch contract for listeners (#9691) * Add note on batch contract Using listeners requires batches to be consistent. This is obvious if you understand how the listener works, but it wasn't clearly stated in the Docs, and was subtle enough that the EntityLinker missed it. There is probably a clearer way to explain what the actual requirement is, but I figure this is a good start. * Rewrite to clarify role of caching --- website/docs/api/architectures.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/website/docs/api/architectures.md b/website/docs/api/architectures.md index 01ca4540b..44ba94d9e 100644 --- a/website/docs/api/architectures.md +++ b/website/docs/api/architectures.md @@ -124,6 +124,14 @@ Instead of defining its own `Tok2Vec` instance, a model architecture like [Tagger](/api/architectures#tagger) can define a listener as its `tok2vec` argument that connects to the shared `tok2vec` component in the pipeline. +Listeners work by caching the `Tok2Vec` output for a given batch of `Doc`s. This +means that in order for a component to work with the listener, the batch of +`Doc`s passed to the listener must be the same as the batch of `Doc`s passed to +the `Tok2Vec`. As a result, any manipulation of the `Doc`s which would affect +`Tok2Vec` output, such as to create special contexts or remove `Doc`s for which +no prediction can be made, must happen inside the model, **after** the call to +the `Tok2Vec` component. + | Name | Description | | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `width` | The width of the vectors produced by the "upstream" [`Tok2Vec`](/api/tok2vec) component. ~~int~~ |