From 5bbdd7dc4c47f153e6e0de5ee4156cf52ab2695f Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Tue, 6 Apr 2021 14:13:22 +1000 Subject: [PATCH] Update pipeline design docs [ci skip] --- website/docs/images/pipeline-design.svg | 49 +++++++++++++++++++++++++ website/docs/models/index.md | 14 +++---- 2 files changed, 56 insertions(+), 7 deletions(-) create mode 100644 website/docs/images/pipeline-design.svg diff --git a/website/docs/images/pipeline-design.svg b/website/docs/images/pipeline-design.svg new file mode 100644 index 000000000..88ccdab99 --- /dev/null +++ b/website/docs/images/pipeline-design.svg @@ -0,0 +1,49 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/website/docs/models/index.md b/website/docs/models/index.md index d37e9471d..69ef5fcf2 100644 --- a/website/docs/models/index.md +++ b/website/docs/models/index.md @@ -55,15 +55,15 @@ For a detailed compatibility overview, see the This is also the source of spaCy's internal compatibility check, performed when you run the [`download`](/api/cli#download) command. -## Pretrained pipeline design {#design} +## Trained pipeline design {#design} -The spaCy v3 pretrained pipelines are designed to be efficient and configurable. +The spaCy v3 trained pipelines are designed to be efficient and configurable. For example, multiple components can share a common "token-to-vector" model and it's easy to swap out or disable the lemmatizer. The pipelines are designed to be efficient in terms of speed and size and work well when the pipeline is run in full. -When modifying a pretrained pipeline, it's important to understand how the +When modifying a trained pipeline, it's important to understand how the components **depend on** each other. Unlike spaCy v2, where the `tagger`, `parser` and `ner` components were all independent, some v3 components depend on earlier components in the pipeline. As a result, disabling or reordering @@ -84,6 +84,8 @@ Main changes from spaCy v2 models: ### CNN/CPU pipeline design +![Components and their dependencies in the CNN pipelines](../images/pipeline-design.svg) + In the `sm`/`md`/`lg` models: - The `tagger`, `morphologizer` and `parser` components listen to the `tok2vec` @@ -99,11 +101,9 @@ In the `sm`/`md`/`lg` models: `tagger`+`attribute_ruler` or `morphologizer`. - The `ner` component is independent with its own internal tok2vec layer. - - ### Transformer pipeline design -In the tranformer (`trf`) models, the `tagger`, `parser` and `ner` (if present) +In the transformer (`trf`) models, the `tagger`, `parser` and `ner` (if present) all listen to the `transformer` component. The `attribute_ruler` and `lemmatizer` have the same configuration as in the CNN models. @@ -112,7 +112,7 @@ all listen to the `transformer` component. The `attribute_ruler` and ### Modifying the default pipeline For faster processing, you may only want to run a subset of the components in a -pretrained pipeline. The `disable` and `exclude` arguments to +trained pipeline. The `disable` and `exclude` arguments to [`spacy.load`](/api/top-level#spacy.load) let you control which components are loaded and run. Disabled components are loaded in the background so it's possible to reenable them in the same pipeline in the future with