mirror of https://github.com/explosion/spaCy.git
2ea9b58006
* Ignore prefix in suffix matches
Ignore the currently matched prefix when looking for suffix matches in
the tokenizer. Otherwise a lookbehind in the suffix pattern may match
incorrectly due the presence of the prefix in the token string.
* Move °[cfkCFK]. to a tokenizer exception
* Adjust exceptions for same tokenization as v3.1
* Also update test accordingly
* Continue to split . after °CFK if ° is not a prefix
* Exclude new ° exceptions for pl
* Switch back to default tokenization of "° C ."
* Revert "Exclude new ° exceptions for pl"
This reverts commit
|
||
---|---|---|
.. | ||
__init__.py | ||
sun.txt | ||
test_exceptions.py | ||
test_explain.py | ||
test_naughty_strings.py | ||
test_tokenizer.py | ||
test_urls.py | ||
test_whitespace.py |