Merge branch 'master' of https://github.com/explosion/spaCy

2019-10-01 21:37:25 +02:00 · 2019-10-01 21:37:25 +02:00 · 667f294627
parent 64a9577d43 0dd127bb00
commit 667f294627
1 changed files with 28 additions and 0 deletions
--- a/website/docs/usage/v2-2.md
+++ b/website/docs/usage/v2-2.md
@ -334,6 +334,11 @@ check if all of your models are up to date, you can run the
  the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
  pipeline components, vocab) will now include additional data, and models
  written to disk will include additional files.
 - The [`Lemmatizer`](/api/lemmatizer) class is now initialized with an instance
  of [`Lookups`](/api/lookups) containing the rules and tables, instead of dicts
  as separate arguments. This makes it easier to share data tables and modify
  them at runtime. This is mostly internals, but if you've been implementing a
  custom `Lemmatizer`, you'll need to update your code.
 - The [Dutch model](/models/nl) has been trained on a new NER corpus (custom
  labelled UD instead of WikiNER), so their predictions may be very different
  compared to the previous version. The results should be significantly better
@ -399,6 +404,29 @@ don't explicitly install the lookups data, that `nlp` object won't have any
 lemmatization rules available. spaCy will now show you a warning when you train
 a new part-of-speech tagger and the vocab has no lookups available.
 #### Lemmatizer initialization
 This is mainly internals and should hopefully not affect your code. But if
 you've been creating custom [`Lemmatizers`](/api/lemmatizer), you'll need to
 update how they're initialized and pass in an instance of
 [`Lookups`](/api/lookups) with the (optional) tables `lemma_index`, `lemma_exc`,
 `lemma_rules` and `lemma_lookup`.
 ```diff
 from spacy.lemmatizer import Lemmatizer
 + from spacy.lookups import Lookups
 lemma_index = {"verb": ("cope", "cop")}
 lemma_exc = {"verb": {"coping": ("cope",)}}
 lemma_rules = {"verb": [["ing", ""]]}
 - lemmatizer = Lemmatizer(lemma_index, lemma_exc, lemma_rules)
 + lookups = Lookups()
 + lookups.add_table("lemma_index", lemma_index)
 + lookups.add_table("lemma_exc", lemma_exc)
 + lookups.add_table("lemma_rules", lemma_rules)
 + lemmatizer = Lemmatizer(lookups)
 ```
 #### Converting entity offsets to BILUO tags
 If you've been using the