spaCy/spacy/pipeline
Paul O'Leary McCann 6be09bbd07
Fix Entity Linker with tokenization mismatches (fix #9575) (#10457)
* Add failing test

* Partial fix for issue

This kind of works. The issue with token length mismatches is gone. The
problem is that when you get empty lists of encodings to compare, it
fails because the sizes are not the same, even though they're both zero:
(0, 3) vs (0,). Not sure why that happens...

* Short circuit on empties

* Remove spurious check

The check here isn't needed now the the short circuit is fixed.

* Update spacy/tests/pipeline/test_entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Use "eg", not "example"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-05-23 20:42:26 +02:00
..
_edit_tree_internals Refactor error messages to remove hardcoded strings (#10729) 2022-05-02 13:38:46 +02:00
_parser_internals Refactor error messages to remove hardcoded strings (#10729) 2022-05-02 13:38:46 +02:00
legacy Fix entity linker batching (#9669) 2022-03-04 09:17:36 +01:00
__init__.py Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
attributeruler.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
dep_parser.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
edit_tree_lemmatizer.py Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
entity_linker.py Fix Entity Linker with tokenization mismatches (fix #9575) (#10457) 2022-05-23 20:42:26 +02:00
entityruler.py Entity ruler remove pattern (#9685) 2021-12-06 15:32:49 +01:00
functions.py Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
lemmatizer.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
morphologizer.pyx Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
multitask.pyx Replace negative rows with 0 in StaticVectors (#7674) 2021-04-22 18:04:15 +10:00
ner.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
pipe.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
pipe.pyi Add Pipe.hide_labels to omit labels from pipeline meta (#10175) 2022-02-05 17:59:24 +01:00
pipe.pyx Add Pipe.hide_labels to omit labels from pipeline meta (#10175) 2022-02-05 17:59:24 +01:00
sentencizer.pyx Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
senter.pyx Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
spancat.py Save span candidates produced by spancat suggesters (#10413) 2022-03-14 16:46:58 +01:00
tagger.pyx Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
textcat.py Bugfixes and test for rehearse (#10347) 2022-02-23 16:10:05 +01:00
textcat_multilabel.py Fix Scorer.score_cats for missing labels (#9443) 2021-12-29 11:04:39 +01:00
tok2vec.py Fix Tok2Vec for empty batches (#10324) 2022-02-21 10:22:36 +01:00
trainable_pipe.pxd Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
trainable_pipe.pyx Pass excludes when serializing vocab (#8824) 2021-08-03 14:42:44 +02:00
transition_parser.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
transition_parser.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00