mirror of https://github.com/explosion/spaCy.git
f2bfaa1b38
The `Matcher` in `merge_subtokens()` returns all possible subsequences of `subtok`, so for sequences of two or more subtoks it's necessary to filter the matches so that the retokenizer is only merging the longest matches with no overlapping spans. |
||
---|---|---|
.. | ||
__init__.py | ||
entityruler.py | ||
functions.py | ||
hooks.py | ||
morphologizer.pyx | ||
pipes.pyx |