mirror of https://github.com/explosion/spaCy.git
Add note on merging speed in v2.1 (see #3300) [ci skip]
This commit is contained in:
parent
236aa94ded
commit
0fc908d7a5
|
@ -215,6 +215,22 @@ if all of your models are up to date, you can run the
|
|||
means that the `Matcher` in v2.1.x may produce different results compared to
|
||||
the `Matcher` in v2.0.x.
|
||||
|
||||
- The deprecated [`Doc.merge`](/api/doc#merge) and
|
||||
[`Span.merge`](/api/span#merge) methods still work, but you may notice that
|
||||
they now run slower when merging many objects in a row. That's because the
|
||||
merging engine was rewritten to be more reliable and to support more efficient
|
||||
merging **in bulk**. To take advantage of this, you should rewrite your logic
|
||||
to use the [`Doc.retokenize`](/api/doc#retokenize) context manager and perform
|
||||
as many merges as possible together in the `with` block.
|
||||
|
||||
```diff
|
||||
- doc[1:5].merge()
|
||||
- doc[6:8].merge()
|
||||
+ with doc.retokenize() as retokenizer:
|
||||
+ retokenizer.merge(doc[1:5])
|
||||
+ retokenizer.merge(doc[6:8])
|
||||
```
|
||||
|
||||
- For better compatibility with the Universal Dependencies data, the lemmatizer
|
||||
now preserves capitalization, e.g. for proper nouns. See
|
||||
[this issue](https://github.com/explosion/spaCy/issues/3256) for details.
|
||||
|
|
Loading…
Reference in New Issue