Commit Graph

11 Commits

Author SHA1 Message Date
Sofie Van Landeghem e48a09df4e Example class for training data (#4543)
* OrigAnnot class instead of gold.orig_annot list of zipped tuples

* from_orig to replace from_annot_tuples

* rename to RawAnnot

* some unit tests for GoldParse creation and internal format

* removing orig_annot and switching to lists instead of tuple

* rewriting tuples to use RawAnnot (+ debug statements, WIP)

* fix pop() changing the data

* small fixes

* pop-append fixes

* return RawAnnot for existing GoldParse to have uniform interface

* clean up imports

* fix merge_sents

* add unit test for 4402 with new structure (not working yet)

* introduce DocAnnot

* typo fixes

* add unit test for merge_sents

* rename from_orig to from_raw

* fixing unit tests

* fix nn parser

* read_annots to produce text, doc_annot pairs

* _make_golds fix

* rename golds_to_gold_annots

* small fixes

* fix encoding

* have golds_to_gold_annots use DocAnnot

* missed a spot

* merge_sents as function in DocAnnot

* allow specifying only part of the token-level annotations

* refactor with Example class + underlying dicts

* pipeline components to work with Example objects (wip)

* input checking

* fix yielding

* fix calls to update

* small fixes

* fix scorer unit test with new format

* fix kwargs order

* fixes for ud and conllu scripts

* fix reading data for conllu script

* add in proper errors (not fixed numbering yet to avoid merge conflicts)

* fixing few more small bugs

* fix EL script
2019-11-11 17:35:27 +01:00
Ines Montani a9c6104047 Component decorator and component analysis (#4517)
* Add work in progress

* Update analysis helpers and component decorator

* Fix porting of docstrings for Python 2

* Fix docstring stuff on Python 2

* Support meta factories when loading model

* Put auto pipeline analysis behind flag for now

* Analyse pipes on remove_pipe and replace_pipe

* Move analysis to root for now

Try to find a better place for it, but it needs to go for now to avoid circular imports

* Simplify decorator

Don't return a wrapped class and instead just write to the object

* Update existing components and factories

* Add condition in factory for classes vs. functions

* Add missing from_nlp classmethods

* Add "retokenizes" to printed overview

* Update assigns/requires declarations of builtins

* Only return data if no_print is enabled

* Use multiline table for overview

* Don't support Span

* Rewrite errors/warnings and move them to spacy.errors
2019-10-27 13:35:49 +01:00
Matthew Honnibal 0f12082465 Refactor morphologizer 2019-03-09 22:54:59 +00:00
Matthew Honnibal 41a3016019 Refactor morphologizer class map 2019-03-09 20:55:33 +01:00
Matthew Honnibal f742900f83 Set pos attribute in morphologizer 2019-03-09 11:51:11 +00:00
Matthew Honnibal 42bc3ad73b Fix class mapping for morphologizer 2019-03-09 00:20:29 +00:00
Matthew Honnibal 49cf002ac4 Add missing import 2019-03-08 18:59:25 +01:00
Matthew Honnibal d7ec1d62cb Fix Morphologizer 2019-03-08 18:54:25 +01:00
Matthew Honnibal fed0371db7 Remove enums from morphology 2019-03-07 17:14:57 +01:00
Matthew Honnibal 8805966460 Fix moved Morphologizer class 2019-03-07 10:46:27 +01:00
Matthew Honnibal bfa52d9d8a Move morphologizer within spacy/pipes 2019-03-07 01:34:32 +01:00