mirror of https://github.com/explosion/spaCy.git
Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
commit
667f294627
|
@ -334,6 +334,11 @@ check if all of your models are up to date, you can run the
|
||||||
the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
|
the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
|
||||||
pipeline components, vocab) will now include additional data, and models
|
pipeline components, vocab) will now include additional data, and models
|
||||||
written to disk will include additional files.
|
written to disk will include additional files.
|
||||||
|
- The [`Lemmatizer`](/api/lemmatizer) class is now initialized with an instance
|
||||||
|
of [`Lookups`](/api/lookups) containing the rules and tables, instead of dicts
|
||||||
|
as separate arguments. This makes it easier to share data tables and modify
|
||||||
|
them at runtime. This is mostly internals, but if you've been implementing a
|
||||||
|
custom `Lemmatizer`, you'll need to update your code.
|
||||||
- The [Dutch model](/models/nl) has been trained on a new NER corpus (custom
|
- The [Dutch model](/models/nl) has been trained on a new NER corpus (custom
|
||||||
labelled UD instead of WikiNER), so their predictions may be very different
|
labelled UD instead of WikiNER), so their predictions may be very different
|
||||||
compared to the previous version. The results should be significantly better
|
compared to the previous version. The results should be significantly better
|
||||||
|
@ -399,6 +404,29 @@ don't explicitly install the lookups data, that `nlp` object won't have any
|
||||||
lemmatization rules available. spaCy will now show you a warning when you train
|
lemmatization rules available. spaCy will now show you a warning when you train
|
||||||
a new part-of-speech tagger and the vocab has no lookups available.
|
a new part-of-speech tagger and the vocab has no lookups available.
|
||||||
|
|
||||||
|
#### Lemmatizer initialization
|
||||||
|
|
||||||
|
This is mainly internals and should hopefully not affect your code. But if
|
||||||
|
you've been creating custom [`Lemmatizers`](/api/lemmatizer), you'll need to
|
||||||
|
update how they're initialized and pass in an instance of
|
||||||
|
[`Lookups`](/api/lookups) with the (optional) tables `lemma_index`, `lemma_exc`,
|
||||||
|
`lemma_rules` and `lemma_lookup`.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
from spacy.lemmatizer import Lemmatizer
|
||||||
|
+ from spacy.lookups import Lookups
|
||||||
|
|
||||||
|
lemma_index = {"verb": ("cope", "cop")}
|
||||||
|
lemma_exc = {"verb": {"coping": ("cope",)}}
|
||||||
|
lemma_rules = {"verb": [["ing", ""]]}
|
||||||
|
- lemmatizer = Lemmatizer(lemma_index, lemma_exc, lemma_rules)
|
||||||
|
+ lookups = Lookups()
|
||||||
|
+ lookups.add_table("lemma_index", lemma_index)
|
||||||
|
+ lookups.add_table("lemma_exc", lemma_exc)
|
||||||
|
+ lookups.add_table("lemma_rules", lemma_rules)
|
||||||
|
+ lemmatizer = Lemmatizer(lookups)
|
||||||
|
```
|
||||||
|
|
||||||
#### Converting entity offsets to BILUO tags
|
#### Converting entity offsets to BILUO tags
|
||||||
|
|
||||||
If you've been using the
|
If you've been using the
|
||||||
|
|
Loading…
Reference in New Issue