spaCy/spacy
Adriane Boyd 85778dfcf4
Add edit tree lemmatizer (#10231)
* Add edit tree lemmatizer

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Hide edit tree lemmatizer labels

* Use relative imports

* Switch to single quotes in error message

* Type annotation fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Reformat edit_tree_lemmatizer with black

* EditTreeLemmatizer.predict: take Iterable

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Validate edit trees during deserialization

This change also changes the serialized representation. Rather than
mirroring the deep C structure, we use a simple flat union of the match
and substitution node types.

* Move edit_trees to _edit_tree_internals

* Fix invalid edit tree format error message

* edit_tree_lemmatizer: remove outdated TODO comment

* Rename factory name to trainable_lemmatizer

* Ignore type instead of casting truths to List[Union[Ints1d, Floats2d, List[int], List[str]]] for thinc v8.0.14

* Switch to Tagger.v2

* Add documentation for EditTreeLemmatizer

* docs: Fix 3.2 -> 3.3 somewhere

* trainable_lemmatizer documentation fixes

* docs: EditTreeLemmatizer is in edit_tree_lemmatizer.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-03-28 11:13:50 +02:00
..
cli Stream large assets on download (#10521) 2022-03-24 11:47:05 +01:00
displacy Add displacy support for overlapping Spans (#10332) 2022-03-16 18:14:34 +01:00
lang Examples for Slovene (#10539) 2022-03-28 10:44:10 +02:00
matcher matcher: remove an undefined behavior (#10537) 2022-03-24 11:48:22 +01:00
ml Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
pipeline Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
tests Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
tokens Maintain support for empty DocBin span groups (#10538) 2022-03-24 11:51:07 +01:00
training Fix get_matching_ents (#10451) 2022-03-07 16:56:57 +01:00
__init__.pxd
__init__.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
__main__.py
about.py Set version to v3.2.2 (#10262) 2022-02-11 11:45:26 +01:00
attrs.pxd
attrs.pyx Intify IOB (#9738) 2022-01-20 13:19:38 +01:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config.cfg Add a few docs to the default_config.cfg (#9981) 2022-01-05 09:16:40 +01:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
errors.py Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
glossary.py Token sent attributes more consistent (#10164) 2022-02-08 08:35:37 +01:00
kb.pxd
kb.pyx Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
language.py Bugfixes and test for rehearse (#10347) 2022-02-23 16:10:05 +01:00
lexeme.pxd
lexeme.pyi fix type of lexeme.rank (#9979) 2022-01-04 13:15:25 +01:00
lexeme.pyx Bugfix for similarity return types (#10051) 2022-01-20 11:40:46 +01:00
lookups.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd
parts_of_speech.pyx
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed
schemas.py Add ENT_IOB key to Matcher (#9649) 2022-01-20 13:18:39 +01:00
scorer.py Fix Scorer.score_cats for missing labels (#9443) 2021-12-29 11:04:39 +01:00
strings.pxd Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
strings.pyi 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
strings.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
structs.pxd
symbols.pxd
symbols.pyx
tokenizer.pxd Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
tokenizer.pyx Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd
typedefs.pyx
util.py hook up meta in load_model_from_config (#10400) 2022-03-04 11:07:45 +01:00
vectors.pyx Save vectors as little endian, load with Ops.asarray (#10201) 2022-03-21 14:24:46 +01:00
vocab.pxd Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
vocab.pyi Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
vocab.pyx Update docs for Vocab.get_vector (#10486) 2022-03-15 09:10:47 +01:00