Commit Graph

8 Commits

Author SHA1 Message Date
Grey Murav a9756963e6
Extend list of abbreviations for ru language (#10282)
* Extend list of abbreviations for ru language

Extended list of abbreviations for ru language those may have influence on tokenization.

* black formatting

Co-authored-by: thomashacker <EdwardSchmuhl@web.de>
2022-02-17 15:48:50 +01:00
Ines Montani a624ae0675 Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
Ines Montani b507f61629 Tidy up and move noun_chunks, token_match, url_match 2020-07-22 22:18:46 +02:00
Ines Montani db55577c45
Drop Python 2.7 and 3.5 (#4828)
* Remove unicode declarations

* Remove Python 3.5 and 2.7 from CI

* Don't require pathlib

* Replace compat helpers

* Remove OrderedDict

* Use f-strings

* Set Cython compiler language level

* Fix typo

* Re-add OrderedDict for Table

* Update setup.cfg

* Revert CONTRIBUTING.md

* Revert lookups.md

* Revert top-level.md

* Small adjustments and docs [ci skip]
2019-12-22 01:53:56 +01:00
Ines Montani f580302673 Tidy up and auto-format 2019-08-20 17:36:34 +02:00
Vadim Mazaev cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Vadim Mazaev 52ee1f9bf9 Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
2017-11-21 11:44:46 +03:00
Vadim Mazaev a0739a06d4 Returned russian support from v1.10 branch 2017-11-17 17:06:15 +03:00