mirror of https://github.com/explosion/spaCy.git
f2bfaa1b38
The `Matcher` in `merge_subtokens()` returns all possible subsequences of `subtok`, so for sequences of two or more subtoks it's necessary to filter the matches so that the retokenizer is only merging the longest matches with no overlapping spans. |
||
---|---|---|
.. | ||
__init__.py | ||
test_analysis.py | ||
test_entity_linker.py | ||
test_entity_ruler.py | ||
test_factories.py | ||
test_functions.py | ||
test_pipe_methods.py | ||
test_sentencizer.py | ||
test_textcat.py |