From 0dd127bb00c111613bd0a23a39c7220bf17d2b12 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Tue, 1 Oct 2019 21:37:06 +0200 Subject: [PATCH] Update v2-2.md [ci skip] --- website/docs/usage/v2-2.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/website/docs/usage/v2-2.md b/website/docs/usage/v2-2.md index 3941b046c..ef616825a 100644 --- a/website/docs/usage/v2-2.md +++ b/website/docs/usage/v2-2.md @@ -334,6 +334,11 @@ check if all of your models are up to date, you can run the the `Vocab` and serialized with it. This means that serialized objects (`nlp`, pipeline components, vocab) will now include additional data, and models written to disk will include additional files. +- The [`Lemmatizer`](/api/lemmatizer) class is now initialized with an instance + of [`Lookups`](/api/lookups) containing the rules and tables, instead of dicts + as separate arguments. This makes it easier to share data tables and modify + them at runtime. This is mostly internals, but if you've been implementing a + custom `Lemmatizer`, you'll need to update your code. - The [Dutch model](/models/nl) has been trained on a new NER corpus (custom labelled UD instead of WikiNER), so their predictions may be very different compared to the previous version. The results should be significantly better @@ -399,6 +404,29 @@ don't explicitly install the lookups data, that `nlp` object won't have any lemmatization rules available. spaCy will now show you a warning when you train a new part-of-speech tagger and the vocab has no lookups available. +#### Lemmatizer initialization + +This is mainly internals and should hopefully not affect your code. But if +you've been creating custom [`Lemmatizers`](/api/lemmatizer), you'll need to +update how they're initialized and pass in an instance of +[`Lookups`](/api/lookups) with the (optional) tables `lemma_index`, `lemma_exc`, +`lemma_rules` and `lemma_lookup`. + +```diff +from spacy.lemmatizer import Lemmatizer ++ from spacy.lookups import Lookups + +lemma_index = {"verb": ("cope", "cop")} +lemma_exc = {"verb": {"coping": ("cope",)}} +lemma_rules = {"verb": [["ing", ""]]} +- lemmatizer = Lemmatizer(lemma_index, lemma_exc, lemma_rules) ++ lookups = Lookups() ++ lookups.add_table("lemma_index", lemma_index) ++ lookups.add_table("lemma_exc", lemma_exc) ++ lookups.add_table("lemma_rules", lemma_rules) ++ lemmatizer = Lemmatizer(lookups) +``` + #### Converting entity offsets to BILUO tags If you've been using the