spaCy/spacy
Paul O'Leary McCann d959603d51
Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246)
* Don't add duplicate patterns (fix #8216)

* Refactor EntityRuler init

This simplifies the EntityRuler init code. This is helpful as prep for
allowing the EntityRuler to reset itself.

* Make EntityRuler.clear reset matchers

Includes a new test for this.

* Tidy PhraseMatcher instantiation

Since the attr can be None safely now, the guard if is no longer
required here.

Also renamed the `_validate` attr. Maybe it's not needed?

* Fix NER test

* Add test to make sure patterns aren't increasing

* Move test to regression tests
2021-06-03 09:05:26 +02:00
..
cli Fix other open calls without context managers (#8245) 2021-05-31 19:04:29 +10:00
displacy Also exclude user hooks in displacy conversion (#7419) 2021-03-12 09:41:59 +01:00
lang Update Vietnamese tokenizer (#8099) 2021-05-17 18:16:20 +10:00
matcher Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246) 2021-06-03 09:05:26 +02:00
ml Set up GPU CI testing (#7293) 2021-04-22 14:58:29 +02:00
pipeline Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246) 2021-06-03 09:05:26 +02:00
tests Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246) 2021-06-03 09:05:26 +02:00
tokens Fix/update extension copying in Span.as_doc and Doc.from_docs (#7574) 2021-03-30 09:49:12 +02:00
training Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
__init__.pxd
__init__.py Add vocab kwarg back to spacy.load 2021-03-11 10:58:59 +01:00
__main__.py
about.py Set version to v3.0.6 (#7854) 2021-04-22 16:33:26 +02:00
attrs.pxd
attrs.pyx
compat.py
default_config.cfg Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
default_config_pretraining.cfg pretrain architectures (#6451) 2020-12-08 14:41:03 +08:00
errors.py Add callback to copy vocab/tokenizer from model (#7750) 2021-04-22 12:36:50 +02:00
glossary.py
kb.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
kb.pyx Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
language.py Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
lexeme.pxd
lexeme.pyx reduce memory load when reading all vectors from file (#6945) 2021-02-07 08:05:43 +08:00
lookups.py Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd
parts_of_speech.pyx
pipe_analysis.py
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
scorer.py Extend score_spans for overlapping & non-labeled spans (#7209) 2021-04-08 12:19:17 +02:00
strings.pxd
strings.pyx Make vocab update in get_docs deterministic (#7603) 2021-04-09 11:53:13 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
tokenizer.pyx Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx
util.py Fix scoring normalization (#7629) 2021-04-26 16:53:38 +02:00
vectors.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00
vocab.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
vocab.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00