spaCy/spacy/tests/serialize
adrianeboyd b841d3fe75 Add a tagger-based SentenceRecognizer (#4713)
* Add sent_starts to GoldParse

* Add SentTagger pipeline component

Add `SentTagger` pipeline component as a subclass of `Tagger`.

* Model reduces default parameters from `Tagger` to be small and fast
* Hard-coded set of two labels:
  * S (1): token at beginning of sentence
  * I (0): all other sentence positions
* Sets `token.sent_start` values

* Add sentence segmentation to Scorer

Report `sent_p/r/f` for sentence boundaries, which may be provided by
various pipeline components.

* Add sentence segmentation to CLI evaluate

* Add senttagger metrics/scoring to train CLI

* Rename SentTagger to SentenceRecognizer

* Add SentenceRecognizer to spacy.pipes imports

* Add SentenceRecognizer serialization test

* Shorten component name to sentrec

* Remove duplicates from train CLI output metrics
2019-11-28 11:10:07 +01:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_serialize_doc.py Tidy up and auto-format 2019-10-18 11:27:38 +02:00
test_serialize_extension_attrs.py Revert #4334 2019-09-29 17:32:12 +02:00
test_serialize_kb.py Fix test imports 2019-09-29 17:34:56 +02:00
test_serialize_language.py Revert #4334 2019-09-29 17:32:12 +02:00
test_serialize_pipeline.py Add a tagger-based SentenceRecognizer (#4713) 2019-11-28 11:10:07 +01:00
test_serialize_tokenizer.py Revert #4334 2019-09-29 17:32:12 +02:00
test_serialize_vocab_strings.py Revert #4334 2019-09-29 17:32:12 +02:00