Commit Graph

12 Commits

Author SHA1 Message Date
adrianeboyd b841d3fe75 Add a tagger-based SentenceRecognizer (#4713)
* Add sent_starts to GoldParse

* Add SentTagger pipeline component

Add `SentTagger` pipeline component as a subclass of `Tagger`.

* Model reduces default parameters from `Tagger` to be small and fast
* Hard-coded set of two labels:
  * S (1): token at beginning of sentence
  * I (0): all other sentence positions
* Sets `token.sent_start` values

* Add sentence segmentation to Scorer

Report `sent_p/r/f` for sentence boundaries, which may be provided by
various pipeline components.

* Add sentence segmentation to CLI evaluate

* Add senttagger metrics/scoring to train CLI

* Rename SentTagger to SentenceRecognizer

* Add SentenceRecognizer to spacy.pipes imports

* Add SentenceRecognizer serialization test

* Shorten component name to sentrec

* Remove duplicates from train CLI output metrics
2019-11-28 11:10:07 +01:00
Matthew Honnibal bcd08f20af Merge changes from master 2019-08-21 14:18:52 +02:00
Sofie a4a6bfa4e1
Merge branch 'master' into feature/el-framework 2019-03-26 11:00:02 +01:00
Ines Montani 06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
svlandeg d849eb2455 adding kb_id as field to token, el as nlp pipeline component 2019-03-22 11:34:46 +01:00
Matthew Honnibal 3908911da4 Fix import 2019-03-08 17:04:14 +01:00
Matthew Honnibal 8a9181d95a Merge __init__ 2019-03-08 16:58:42 +01:00
Matthew Honnibal 4cf897e8e1 Update from develop 2019-03-08 16:56:54 +01:00
Ines Montani d260aa17fd Merge branch 'develop' into feature/lemmatizer 2019-03-08 13:25:00 +01:00
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Matthew Honnibal fc1cc4c529 Move morphologizer under spacy/pipes 2019-03-07 01:36:26 +01:00
Ines Montani a9f8d17632
💫 Break up large pipeline.pyx (#3246)
* Break up large pipeline.pyx

* Merge some components back together

* Fix typo
2019-02-10 12:14:51 +01:00