From 0fc908d7a5f82cdc7e1d76a7278077a23f2cc0e3 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 21 Feb 2019 12:34:18 +0100 Subject: [PATCH] Add note on merging speed in v2.1 (see #3300) [ci skip] --- website/docs/usage/v2-1.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/website/docs/usage/v2-1.md b/website/docs/usage/v2-1.md index 988531e00..bdf0cfa1f 100644 --- a/website/docs/usage/v2-1.md +++ b/website/docs/usage/v2-1.md @@ -215,6 +215,22 @@ if all of your models are up to date, you can run the means that the `Matcher` in v2.1.x may produce different results compared to the `Matcher` in v2.0.x. +- The deprecated [`Doc.merge`](/api/doc#merge) and + [`Span.merge`](/api/span#merge) methods still work, but you may notice that + they now run slower when merging many objects in a row. That's because the + merging engine was rewritten to be more reliable and to support more efficient + merging **in bulk**. To take advantage of this, you should rewrite your logic + to use the [`Doc.retokenize`](/api/doc#retokenize) context manager and perform + as many merges as possible together in the `with` block. + + ```diff + - doc[1:5].merge() + - doc[6:8].merge() + + with doc.retokenize() as retokenizer: + + retokenizer.merge(doc[1:5]) + + retokenizer.merge(doc[6:8]) + ``` + - For better compatibility with the Universal Dependencies data, the lemmatizer now preserves capitalization, e.g. for proper nouns. See [this issue](https://github.com/explosion/spaCy/issues/3256) for details.