Commit Graph

5 Commits

Author SHA1 Message Date
Adriane Boyd 7c98245c0c
Add levenshtein from polyleven (#11418)
Add a simple levenshtein distance function using the implementation from
the polyleven library as `spacy.matcher.levenshtein`.
2022-09-14 17:05:22 +02:00
Ines Montani d94ddd5686
Auto-detect package dependencies in spacy package (#8948)
* Auto-detect package dependencies in spacy package

* Add simple get_third_party_dependencies test

* Import packages_distributions explicitly

* Inline packages_distributions

* Fix docstring [ci skip]

* Relax catalogue requirement

* Move importlib_metadata to spacy.compat with note

* Include license information [ci skip]
2021-08-17 14:05:13 +02:00
Adriane Boyd 1d59fdbd39
Update Vietnamese tokenizer (#8099)
* Adapt tokenization methods from `pyvi` to preserve text encoding and
whitespace
* Add serialization support similar to Chinese and Japanese

Note: as for Chinese and Japanese, some settings are duplicated in
`config.cfg` and `tokenizer/cfg`.
2021-05-17 18:16:20 +10:00
Adriane Boyd 31ec9a906e
Clean up 3rd party license info (#6478)
Move scikit-learn license from `Scorer` to
`licenses/3rd_party_licenses.txt`.
2020-12-02 10:15:23 +01:00
Adriane Boyd caf23462eb
Add 3rd party licenses (#5959) 2020-08-26 15:23:59 +02:00