spaCy

Commit Graph

Author	SHA1	Message	Date
adrianeboyd	f2bfaa1b38	Filter subtoken matches in merge_subtokens() (#4539 ) The `Matcher` in `merge_subtokens()` returns all possible subsequences of `subtok`, so for sequences of two or more subtoks it's necessary to filter the matches so that the retokenizer is only merging the longest matches with no overlapping spans.	2019-10-28 15:40:28 +01:00

Author

SHA1

Message

Date

adrianeboyd

f2bfaa1b38

Filter subtoken matches in merge_subtokens() (#4539 )

The `Matcher` in `merge_subtokens()` returns all possible subsequences
of `subtok`, so for sequences of two or more subtoks it's necessary to
filter the matches so that the retokenizer is only merging the longest
matches with no overlapping spans.

2019-10-28 15:40:28 +01:00

1 Commits