mirror of https://github.com/explosion/spaCy.git
cbc2cee2c8
* Move prefix and suffix detection for URL_PATTERN Move prefix and suffix detection for `URL_PATTERN` into the tokenizer. Remove associated lookahead and lookbehind from `URL_PATTERN`. Fix tokenization for Hungarian given new modified handling of prefixes and suffixes. * Match a wider range of URI schemes |
||
---|---|---|
.. | ||
__init__.py | ||
sun.txt | ||
test_exceptions.py | ||
test_naughty_strings.py | ||
test_tokenizer.py | ||
test_urls.py | ||
test_whitespace.py |