spaCy/spacy
Adriane Boyd d5bbd1f94f
Handle partial entities in Span.as_doc (#8055)
* Handle partial entities in Span.as_doc

In `Span.as_doc` replace partial entities at the beginning or end of the
span with missing entity annotation.

Fixes a bug where invalid entity annotation (no initial `B`) was
returned for an initial partial entity.

* Check for empty span in ents conversion

Note: `Span.as_doc()` will still fail on an empty span due to failures
in `Span.vector`.
2021-05-11 17:10:16 +02:00
..
cli Fix 'debug model' for transformers + generalize (#7973) 2021-05-06 18:43:32 +10:00
displacy Also exclude user hooks in displacy conversion (#7419) 2021-03-12 09:41:59 +01:00
lang Fix/fix en ordinals (#8028) 2021-05-07 10:26:42 +02:00
matcher Fix span offsets for Matcher(as_spans) on spans (#7992) 2021-05-06 18:42:44 +10:00
ml make EntityLinker robust for nO=None (#7930) 2021-05-06 18:14:47 +10:00
pipeline Preserve existing ENT_KB_ID annotation in NER (#7988) 2021-05-06 18:49:55 +10:00
tests Handle partial entities in Span.as_doc (#8055) 2021-05-11 17:10:16 +02:00
tokens Handle partial entities in Span.as_doc (#8055) 2021-05-11 17:10:16 +02:00
training Add callback to copy vocab/tokenizer from model (#7750) 2021-04-22 12:36:50 +02:00
__init__.pxd
__init__.py Add vocab kwarg back to spacy.load 2021-03-11 10:58:59 +01:00
__main__.py
about.py Set version to v3.0.6 (#7854) 2021-04-22 16:33:26 +02:00
attrs.pxd
attrs.pyx
compat.py
default_config.cfg Support large/infinite training corpora (#7208) 2021-04-08 18:08:04 +10:00
default_config_pretraining.cfg
errors.py Add callback to copy vocab/tokenizer from model (#7750) 2021-04-22 12:36:50 +02:00
glossary.py Add Chinese PTB tags to glossary (#7993) 2021-05-06 18:43:03 +10:00
kb.pxd
kb.pyx Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
language.py Fix typo in Language docstrings (#7958) 2021-05-03 14:44:09 +02:00
lexeme.pxd
lexeme.pyx reduce memory load when reading all vectors from file (#6945) 2021-02-07 08:05:43 +08:00
lookups.py Update load_lookups return type and docstring (#7907) 2021-04-27 09:13:39 +02:00
morphology.pxd
morphology.pyx
parts_of_speech.pxd
parts_of_speech.pyx
pipe_analysis.py
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Support env vars and CLI overrides for project.yml 2021-02-10 13:45:27 +11:00
scorer.py Extend score_spans for overlapping & non-labeled spans (#7209) 2021-04-08 12:19:17 +02:00
strings.pxd
strings.pyx Make vocab update in get_docs deterministic (#7603) 2021-04-09 11:53:13 +02:00
structs.pxd
symbols.pxd
symbols.pyx
tokenizer.pxd Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
tokenizer.pyx Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
typedefs.pxd
typedefs.pyx
util.py Refactor util.to_ternary_int (#7944) 2021-04-29 16:58:54 +02:00
vectors.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00
vocab.pxd
vocab.pyx Skip vector ngram backoff if minn is not set (#7925) 2021-05-06 18:34:35 +10:00