mirror of https://github.com/explosion/spaCy.git
Merge branch 'spacy.io' [ci skip]
This commit is contained in:
parent
23eef78a4a
commit
dfb23a419e
|
@ -180,7 +180,7 @@ entirely **in Markdown**, without having to compromise on easy-to-use custom UI
|
|||
components. We're hoping that the Markdown source will make it even easier to
|
||||
contribute to the documentation. For more details, check out the
|
||||
[styleguide](/styleguide) and
|
||||
[source](https://github.com/explosion/spaCy/tree/master/website). While
|
||||
[source](https://github.com/explosion/spacy/tree/v2.x/website). While
|
||||
converting the pages to Markdown, we've also fixed a bunch of typos, improved
|
||||
the existing pages and added some new content:
|
||||
|
||||
|
|
|
@ -161,8 +161,8 @@ debugging your tokenizer configuration.
|
|||
|
||||
spaCy's custom warnings have been replaced with native Python
|
||||
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
||||
setting `SPACY_WARNING_IGNORE`, use the [`warnings`
|
||||
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||
setting `SPACY_WARNING_IGNORE`, use the
|
||||
[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||
to manage warnings.
|
||||
|
||||
```diff
|
||||
|
@ -176,7 +176,7 @@ import spacy
|
|||
#### Normalization tables
|
||||
|
||||
The normalization tables have moved from the language data in
|
||||
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) to the
|
||||
[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang) to the
|
||||
package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
|
||||
If you're adding data for a new language, the normalization table should be
|
||||
added to `spacy-lookups-data`. See
|
||||
|
@ -190,8 +190,8 @@ lexemes will be added to the vocab automatically, just as in small models
|
|||
without vectors.
|
||||
|
||||
To see the number of unique vectors and number of words with vectors, see
|
||||
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
|
||||
unique vectors and `684830` words with vectors:
|
||||
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` unique
|
||||
vectors and `684830` words with vectors:
|
||||
|
||||
```python
|
||||
{
|
||||
|
@ -210,8 +210,8 @@ for orth in nlp.vocab.vectors:
|
|||
_ = nlp.vocab[orth]
|
||||
```
|
||||
|
||||
If your workflow previously iterated over `nlp.vocab`, a similar alternative
|
||||
is to iterate over words with vectors instead:
|
||||
If your workflow previously iterated over `nlp.vocab`, a similar alternative is
|
||||
to iterate over words with vectors instead:
|
||||
|
||||
```diff
|
||||
- lexemes = [w for w in nlp.vocab]
|
||||
|
@ -220,9 +220,9 @@ is to iterate over words with vectors instead:
|
|||
|
||||
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
|
||||
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
|
||||
provided lexemes but only 685K words with vectors. The vectors have been
|
||||
updated for most languages in v2.2, but the English models contain the same
|
||||
vectors for both v2.2 and v2.3.
|
||||
provided lexemes but only 685K words with vectors. The vectors have been updated
|
||||
for most languages in v2.2, but the English models contain the same vectors for
|
||||
both v2.2 and v2.3.
|
||||
|
||||
#### Lexeme.is_oov and Token.is_oov
|
||||
|
||||
|
@ -234,8 +234,7 @@ fixed in the next patch release v2.3.1.
|
|||
</Infobox>
|
||||
|
||||
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
|
||||
have a word vector. This is equivalent to `token.orth not in
|
||||
nlp.vocab.vectors`.
|
||||
have a word vector. This is equivalent to `token.orth not in nlp.vocab.vectors`.
|
||||
|
||||
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
|
||||
probability and cluster features. The probability and cluster features are no
|
||||
|
@ -270,8 +269,8 @@ as part of the model vocab.
|
|||
|
||||
To load the probability table into a provided model, first make sure you have
|
||||
`spacy-lookups-data` installed. To load the table, remove the empty provided
|
||||
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
|
||||
table from `spacy-lookups-data`:
|
||||
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the table
|
||||
from `spacy-lookups-data`:
|
||||
|
||||
```diff
|
||||
+ # prerequisite: pip install spacy-lookups-data
|
||||
|
@ -321,9 +320,9 @@ the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
|
|||
provide in the tag map as a JSON dict.
|
||||
|
||||
If you want to export a tag map from a provided model for use with the train
|
||||
CLI, you can save it as a JSON dict. To only use string keys as required by
|
||||
JSON and to make it easier to read and edit, any internal integer IDs need to
|
||||
be converted back to strings:
|
||||
CLI, you can save it as a JSON dict. To only use string keys as required by JSON
|
||||
and to make it easier to read and edit, any internal integer IDs need to be
|
||||
converted back to strings:
|
||||
|
||||
```python
|
||||
import spacy
|
||||
|
|
|
@ -303,7 +303,7 @@ lookup-based lemmatization – and **many new languages**!
|
|||
<Infobox>
|
||||
|
||||
**API:** [`Language`](/api/language) **Code:**
|
||||
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang)
|
||||
[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang)
|
||||
**Usage:** [Adding languages](/usage/adding-languages)
|
||||
|
||||
</Infobox>
|
||||
|
|
Loading…
Reference in New Issue