Merge branch 'spacy.io' [ci skip]

This commit is contained in:
Ines Montani 2021-03-06 17:38:54 +11:00
parent 23eef78a4a
commit dfb23a419e
3 changed files with 18 additions and 19 deletions

View File

@ -180,7 +180,7 @@ entirely **in Markdown**, without having to compromise on easy-to-use custom UI
components. We're hoping that the Markdown source will make it even easier to
contribute to the documentation. For more details, check out the
[styleguide](/styleguide) and
[source](https://github.com/explosion/spaCy/tree/master/website). While
[source](https://github.com/explosion/spacy/tree/v2.x/website). While
converting the pages to Markdown, we've also fixed a bunch of typos, improved
the existing pages and added some new content:

View File

@ -161,8 +161,8 @@ debugging your tokenizer configuration.
spaCy's custom warnings have been replaced with native Python
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
setting `SPACY_WARNING_IGNORE`, use the [`warnings`
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
setting `SPACY_WARNING_IGNORE`, use the
[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
to manage warnings.
```diff
@ -176,7 +176,7 @@ import spacy
#### Normalization tables
The normalization tables have moved from the language data in
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) to the
[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang) to the
package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
If you're adding data for a new language, the normalization table should be
added to `spacy-lookups-data`. See
@ -190,8 +190,8 @@ lexemes will be added to the vocab automatically, just as in small models
without vectors.
To see the number of unique vectors and number of words with vectors, see
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
unique vectors and `684830` words with vectors:
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` unique
vectors and `684830` words with vectors:
```python
{
@ -210,8 +210,8 @@ for orth in nlp.vocab.vectors:
_ = nlp.vocab[orth]
```
If your workflow previously iterated over `nlp.vocab`, a similar alternative
is to iterate over words with vectors instead:
If your workflow previously iterated over `nlp.vocab`, a similar alternative is
to iterate over words with vectors instead:
```diff
- lexemes = [w for w in nlp.vocab]
@ -220,9 +220,9 @@ is to iterate over words with vectors instead:
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
provided lexemes but only 685K words with vectors. The vectors have been
updated for most languages in v2.2, but the English models contain the same
vectors for both v2.2 and v2.3.
provided lexemes but only 685K words with vectors. The vectors have been updated
for most languages in v2.2, but the English models contain the same vectors for
both v2.2 and v2.3.
#### Lexeme.is_oov and Token.is_oov
@ -234,8 +234,7 @@ fixed in the next patch release v2.3.1.
</Infobox>
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
have a word vector. This is equivalent to `token.orth not in
nlp.vocab.vectors`.
have a word vector. This is equivalent to `token.orth not in nlp.vocab.vectors`.
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
probability and cluster features. The probability and cluster features are no
@ -270,8 +269,8 @@ as part of the model vocab.
To load the probability table into a provided model, first make sure you have
`spacy-lookups-data` installed. To load the table, remove the empty provided
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
table from `spacy-lookups-data`:
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the table
from `spacy-lookups-data`:
```diff
+ # prerequisite: pip install spacy-lookups-data
@ -321,9 +320,9 @@ the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
provide in the tag map as a JSON dict.
If you want to export a tag map from a provided model for use with the train
CLI, you can save it as a JSON dict. To only use string keys as required by
JSON and to make it easier to read and edit, any internal integer IDs need to
be converted back to strings:
CLI, you can save it as a JSON dict. To only use string keys as required by JSON
and to make it easier to read and edit, any internal integer IDs need to be
converted back to strings:
```python
import spacy

View File

@ -303,7 +303,7 @@ lookup-based lemmatization and **many new languages**!
<Infobox>
**API:** [`Language`](/api/language) **Code:**
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang)
[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang)
**Usage:** [Adding languages](/usage/adding-languages)
</Infobox>