From dfb23a419ee0410f5ef0ce8ebd8b031cd5790e2d Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Sat, 6 Mar 2021 17:38:54 +1100 Subject: [PATCH] =?UTF-8?q?Merge=20branch=20'spacy.io'=C2=A0[ci=20skip]?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- website/docs/usage/v2-1.md | 2 +- website/docs/usage/v2-3.md | 33 ++++++++++++++++----------------- website/docs/usage/v2.md | 2 +- 3 files changed, 18 insertions(+), 19 deletions(-) diff --git a/website/docs/usage/v2-1.md b/website/docs/usage/v2-1.md index 8d310f1a4..500e43803 100644 --- a/website/docs/usage/v2-1.md +++ b/website/docs/usage/v2-1.md @@ -180,7 +180,7 @@ entirely **in Markdown**, without having to compromise on easy-to-use custom UI components. We're hoping that the Markdown source will make it even easier to contribute to the documentation. For more details, check out the [styleguide](/styleguide) and -[source](https://github.com/explosion/spaCy/tree/master/website). While +[source](https://github.com/explosion/spacy/tree/v2.x/website). While converting the pages to Markdown, we've also fixed a bunch of typos, improved the existing pages and added some new content: diff --git a/website/docs/usage/v2-3.md b/website/docs/usage/v2-3.md index b6c4d7dfb..075e1ce81 100644 --- a/website/docs/usage/v2-3.md +++ b/website/docs/usage/v2-3.md @@ -161,8 +161,8 @@ debugging your tokenizer configuration. spaCy's custom warnings have been replaced with native Python [`warnings`](https://docs.python.org/3/library/warnings.html). Instead of -setting `SPACY_WARNING_IGNORE`, use the [`warnings` -filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter) +setting `SPACY_WARNING_IGNORE`, use the +[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter) to manage warnings. ```diff @@ -176,7 +176,7 @@ import spacy #### Normalization tables The normalization tables have moved from the language data in -[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) to the +[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang) to the package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). If you're adding data for a new language, the normalization table should be added to `spacy-lookups-data`. See @@ -190,8 +190,8 @@ lexemes will be added to the vocab automatically, just as in small models without vectors. To see the number of unique vectors and number of words with vectors, see -`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` -unique vectors and `684830` words with vectors: +`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` unique +vectors and `684830` words with vectors: ```python { @@ -210,8 +210,8 @@ for orth in nlp.vocab.vectors: _ = nlp.vocab[orth] ``` -If your workflow previously iterated over `nlp.vocab`, a similar alternative -is to iterate over words with vectors instead: +If your workflow previously iterated over `nlp.vocab`, a similar alternative is +to iterate over words with vectors instead: ```diff - lexemes = [w for w in nlp.vocab] @@ -220,9 +220,9 @@ is to iterate over words with vectors instead: Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M -provided lexemes but only 685K words with vectors. The vectors have been -updated for most languages in v2.2, but the English models contain the same -vectors for both v2.2 and v2.3. +provided lexemes but only 685K words with vectors. The vectors have been updated +for most languages in v2.2, but the English models contain the same vectors for +both v2.2 and v2.3. #### Lexeme.is_oov and Token.is_oov @@ -234,8 +234,7 @@ fixed in the next patch release v2.3.1. In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not -have a word vector. This is equivalent to `token.orth not in -nlp.vocab.vectors`. +have a word vector. This is equivalent to `token.orth not in nlp.vocab.vectors`. Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored probability and cluster features. The probability and cluster features are no @@ -270,8 +269,8 @@ as part of the model vocab. To load the probability table into a provided model, first make sure you have `spacy-lookups-data` installed. To load the table, remove the empty provided -`lexeme_prob` table and then access `Lexeme.prob` for any word to load the -table from `spacy-lookups-data`: +`lexeme_prob` table and then access `Lexeme.prob` for any word to load the table +from `spacy-lookups-data`: ```diff + # prerequisite: pip install spacy-lookups-data @@ -321,9 +320,9 @@ the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to provide in the tag map as a JSON dict. If you want to export a tag map from a provided model for use with the train -CLI, you can save it as a JSON dict. To only use string keys as required by -JSON and to make it easier to read and edit, any internal integer IDs need to -be converted back to strings: +CLI, you can save it as a JSON dict. To only use string keys as required by JSON +and to make it easier to read and edit, any internal integer IDs need to be +converted back to strings: ```python import spacy diff --git a/website/docs/usage/v2.md b/website/docs/usage/v2.md index aee3c24a6..210565c11 100644 --- a/website/docs/usage/v2.md +++ b/website/docs/usage/v2.md @@ -303,7 +303,7 @@ lookup-based lemmatization – and **many new languages**! **API:** [`Language`](/api/language) **Code:** -[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) +[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang) **Usage:** [Adding languages](/usage/adding-languages)