spaCy/spacy/lang
Adriane Boyd 30d31fd335
Update Russian and Ukrainian lemmatizers (#11811)
* pymorph2 issues #11620, #11626, #11625:
- #11620: pymorphy2_lookup
- #11626: handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625: matching DET that are labelled as PRON by pymorhp2

* Move lemmatizer algorithm changes back into RussianLemmatizer

* Fix uk pymorphy3_lookup mode init

* Move and update tests for ru/uk lookup lemmatizer modes

* Fix typo

* Remove traces of previous behavior for uninflected POS

* Refactor to private generic-looking pymorphy methods

* Remove xfailed uk lemmatizer cases

* Update spacy/lang/ru/lemmatizer.py

Co-authored-by: Richard Hudson <richard@explosion.ai>

Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
2022-11-25 11:12:46 +01:00
..
af 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
am Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ar 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
az 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
bg Handle Cyrillic combining diacritics (#10837) 2022-06-28 15:35:32 +02:00
bn Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ca Fix lookup usage in French/Catalan (fix #11347) (#11382) 2022-08-29 10:32:38 +02:00
cs 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
da 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
de 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
dsb Auto-format code with black (#10479) 2022-03-11 12:20:23 +01:00
el Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
en Remove English exceptions with mismatched features (#10873) 2022-06-03 09:44:04 +02:00
es Fix some issues in Spanish examples 2022-04-18 22:12:57 +02:00
et 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
eu 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
fa Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
fi Add a noun chunker for Finnish (#10214) 2022-02-08 08:44:11 +01:00
fr Fix lookup usage in French/Catalan (fix #11347) (#11382) 2022-08-29 10:32:38 +02:00
ga Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
grc add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
gu 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
he 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
hi fix: Add missing comma to `_eleven_to_beyond` (#10166) 2022-01-30 16:45:06 +09:00
hr 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
hsb Auto-format code with black (#10479) 2022-03-11 12:20:23 +01:00
hu Ignore prefix in suffix matches (#9155) 2021-10-27 13:02:25 +02:00
hy 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
id 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
is 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
it Revert "Bump sudachipy version (#9917)" (#10071) 2022-01-17 10:38:37 +01:00
ja Format (#9630) 2021-11-05 09:56:26 +01:00
kn 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
ko Fix regex invalid escape sequences (#11276) 2022-08-09 10:59:36 +02:00
ky 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
la Auto-format code with black (#11427) 2022-09-02 11:43:20 +02:00
lb 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
lg luganda language extension (#10847) 2022-08-23 13:09:36 +02:00
lij 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
lt 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
lv 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
mk Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ml 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
mr 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
nb Revert "Bump sudachipy version (#9917)" (#10071) 2022-01-17 10:38:37 +01:00
ne 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
nl Fix Dutch noun chunks to skip overlapping spans (#11275) 2022-08-10 09:49:08 +02:00
pl Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
pt Portuguese noun chunks review (#9559) 2021-11-04 23:55:49 +01:00
ro 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
ru Update Russian and Ukrainian lemmatizers (#11811) 2022-11-25 11:12:46 +01:00
sa 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
si Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
sk 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
sl Updates to Slovenian language (#11162) 2022-08-05 10:10:18 +02:00
sq 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
sr 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
sv Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ta 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
te 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
th Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ti Format (#9630) 2021-11-05 09:56:26 +01:00
tl 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
tn 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
tr Fixed typo in Turkish lang. (#10582) 2022-03-30 13:16:08 +02:00
tt 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
uk Update Russian and Ukrainian lemmatizers (#11811) 2022-11-25 11:12:46 +01:00
ur 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
vi Auto-format code with black (#10687) 2022-04-22 11:24:53 +02:00
xx fix: Add missing comma to `examples.py` (#10167) 2022-01-30 16:43:29 +09:00
yo 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
zh Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
__init__.py
char_classes.py add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
lex_attrs.py Use tokenizer URL_MATCH pattern in LIKE_URL (#8765) 2021-07-27 12:07:01 +02:00
norm_exceptions.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
punctuation.py Handle Cyrillic combining diacritics (#10837) 2022-06-28 15:35:32 +02:00
tokenizer_exceptions.py Ignore prefix in suffix matches (#9155) 2021-10-27 13:02:25 +02:00