Various docs updates for v3.1 (#8406)

* Update for Catalan/Italian lemmatizer changes

* Add warning about relevance of section
This commit is contained in:
Adriane Boyd 2021-06-21 09:33:50 +02:00 committed by GitHub
parent 7abfa25035
commit e39d1bd4ab
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 15 additions and 6 deletions

View File

@ -64,11 +64,13 @@ libraries (`pymorphy2`).
| Language | Default Mode | | Language | Default Mode |
| -------- | ------------ | | -------- | ------------ |
| `bn` | `rule` | | `bn` | `rule` |
| `ca` | `pos_lookup` |
| `el` | `rule` | | `el` | `rule` |
| `en` | `rule` | | `en` | `rule` |
| `es` | `rule` | | `es` | `rule` |
| `fa` | `rule` | | `fa` | `rule` |
| `fr` | `rule` | | `fr` | `rule` |
| `it` | `pos_lookup` |
| `mk` | `rule` | | `mk` | `rule` |
| `nb` | `rule` | | `nb` | `rule` |
| `nl` | `rule` | | `nl` | `rule` |

View File

@ -97,9 +97,10 @@ In the `sm`/`md`/`lg` models:
tagger. For English, the attribute ruler can improve its mapping from tagger. For English, the attribute ruler can improve its mapping from
`token.tag` to `token.pos` if dependency parses from a `parser` are present, `token.tag` to `token.pos` if dependency parses from a `parser` are present,
but the parser is not required. but the parser is not required.
- The `lemmatizer` component for many languages (Dutch, English, French, Greek, - The `lemmatizer` component for many languages (Catalan, Dutch, English,
Macedonian, Norwegian, Polish and Spanish) requires `token.pos` annotation French, Greek, Italian Macedonian, Norwegian, Polish and Spanish) requires
from either `tagger`+`attribute_ruler` or `morphologizer`. `token.pos` annotation from either `tagger`+`attribute_ruler` or
`morphologizer`.
- The `ner` component is independent with its own internal tok2vec layer. - The `ner` component is independent with its own internal tok2vec layer.
### Transformer pipeline design {#design-trf} ### Transformer pipeline design {#design-trf}
@ -133,9 +134,9 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "attribute_ruler", "lemma
Token.pos"> Token.pos">
The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for
Dutch, English, French, Greek, Macedonian, Norwegian, Polish and Spanish. If you Catalan, Dutch, English, French, Greek, Italian, Macedonian, Norwegian, Polish
disable any of these components, you'll see lemmatizer warnings unless the and Spanish. If you disable any of these components, you'll see lemmatizer
lemmatizer is also disabled. warnings unless the lemmatizer is also disabled.
</Infobox> </Infobox>
@ -184,6 +185,12 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_rule
#### Move NER to the end of the pipeline #### Move NER to the end of the pipeline
<Infobox title="For v3.0.x models only" variant="warning">
As of v3.1, the NER component is at the end of the pipeline by default.
</Infobox>
For access to `POS` and `LEMMA` features in an `entity_ruler`, move `ner` to the For access to `POS` and `LEMMA` features in an `entity_ruler`, move `ner` to the
end of the pipeline after `attribute_ruler` and `lemmatizer`: end of the pipeline after `attribute_ruler` and `lemmatizer`: