spaCy

History

Matthew Honnibal f9946154d9 Add SpanCategorizer component (#6747 ) * Draft spancat model * Add spancat model * Add test for extract_spans * Add extract_spans layer * Upd extract_spans * Add spancat model * Add test for spancat model * Upd spancat model * Update spancat component * Upd spancat * Update spancat model * Add quick spancat test * Import SpanCategorizer * Fix SpanCategorizer component * Import SpanGroup * Fix span extraction * Fix import * Fix import * Upd model * Update spancat models * Add scoring, update defaults * Update and add docs * Fix type * Update spacy/ml/extract_spans.py * Auto-format and fix import * Fix comment * Fix type * Fix type * Update website/docs/api/spancategorizer.md * Fix comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Better defense Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix labels list Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/extract_spans.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/pipeline/spancat.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Set annotations during update * Set annotations in spancat * fix imports in test * Update spacy/pipeline/spancat.py * replace MaxoutLogistic with LinearLogistic * fix config * various small fixes * remove set_annotations parameter in update * use our beloved tupley format with recent support for doc.spans * bugfix to allow renaming the default span_key (scores weren't showing up) * use different key in docs example * change defaults to better-working parameters from project (WIP) * register spacy.extract_spans.v1 for legacy purposes * Upd dev version so can build wheel * layers instead of architectures for smaller building blocks * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Include additional scores from overrides in combined score weights * Parameterize spans key in scoring Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so that it's possible to evaluate multiple `spancat` components in the same pipeline. * Use the (intentionally very short) default spans key `sc` in the `SpanCategorizer` * Adjust the default score weights to include the default key * Adjust the scorer to use `spans_{spans_key}` as the prefix for the returned score * Revert addition of `attr_name` argument to `score_spans` and adjust the key in the `getter` instead. Note that for `spancat` components with a custom `span_key`, the score weights currently need to be modified manually in `[training.score_weights]` for them to be available during training. To suppress the default score weights `spans_sc_p/r/f` during training, set them to `null` in `[training.score_weights]`. * Update website/docs/api/scorer.md * Fix scorer for spans key containing underscore * Increment version * Add Spans to Evaluate CLI (#8439) * Add Spans to Evaluate CLI * Change to spans_key * Add spans per_type output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix spancat GPU issues (#8455) * Fix GPU issues * Require thinc >=8.0.6 * Switch to glorot_uniform_init * Fix and test ngram suggester * Include final ngram in doc for all sizes * Fix ngrams for docs of the same length as ngram size * Handle batches of docs that result in no ngrams * Add tests Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Nirant <NirantK@users.noreply.github.com>		2021-06-24 12:35:27 +02:00
..
cli	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
displacy	Also exclude user hooks in displacy conversion (#7419 )	2021-03-12 09:41:59 +01:00
lang	Fix non-deterministic deduplication in Greek lemmatizer (#8421 )	2021-06-17 09:11:01 +02:00
matcher	Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1	2021-06-15 15:05:17 +02:00
ml	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
pipeline	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
tests	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
tokens	Various fixes for spans in Docs.from_docs (#8487 )	2021-06-23 15:51:35 +02:00
training	Fix setting empty entities in Example.from_dict (#8426 )	2021-06-18 10:41:50 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	…
__main__.py	…
about.py	Set version to v3.1.0 (#8452 )	2021-06-21 10:41:40 +02:00
attrs.pxd	…
attrs.pyx	…
compat.py	…
default_config.cfg	…
default_config_pretraining.cfg	…
errors.py	Use minor version for compatibility check (#8403 )	2021-06-21 09:39:22 +02:00
glossary.py	…
kb.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
kb.pyx	…
language.py	Don't use the same vocab for source models (#8388 )	2021-06-21 09:33:33 +02:00
lexeme.pxd	…
lexeme.pyx	…
lookups.py	Update load_lookups return type and docstring (#7907 )	2021-04-27 09:13:39 +02:00
morphology.pxd	…
morphology.pyx	…
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	…
pipe_analysis.py	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
py.typed	…
schemas.py	…
scorer.py	…
strings.pxd	…
strings.pyx	…
structs.pxd	…
symbols.pxd	…
symbols.pyx	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
tokenizer.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
tokenizer.pyx	Fix tokenizer cache flushing (#7836 )	2021-04-22 18:14:57 +10:00
typedefs.pxd	…
typedefs.pyx	…
util.py	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
vectors.pyx	…
vocab.pxd	…
vocab.pyx	Skip vector ngram backoff if minn is not set (#7925 )	2021-05-06 18:34:35 +10:00