spaCy

History

adrianeboyd b5d999e510 Add textcat to train CLI (#4226 ) * Add doc.cats to spacy.gold at the paragraph level Support `doc.cats` as `"cats": [{"label": string, "value": number}]` in the spacy JSON training format at the paragraph level. * `spacy.gold.docs_to_json()` writes `docs.cats` * `GoldCorpus` reads in cats in each `GoldParse` * Update instances of gold_tuples to handle cats Update iteration over gold_tuples / gold_parses to handle addition of cats at the paragraph level. * Add textcat to train CLI * Add textcat options to train CLI * Add textcat labels in `TextCategorizer.begin_training()` * Add textcat evaluation to `Scorer`: * For binary exclusive classes with provided label: F1 for label * For 2+ exclusive classes: F1 macro average * For multilabel (not exclusive): ROC AUC macro average (currently relying on sklearn) * Provide user info on textcat evaluation settings, potential incompatibilities * Provide pipeline to Scorer in `Language.evaluate` for textcat config * Customize train CLI output to include only metrics relevant to current pipeline * Add textcat evaluation to evaluate CLI * Fix handling of unset arguments and config params Fix handling of unset arguments and model confiug parameters in Scorer initialization. * Temporarily add sklearn requirement * Remove sklearn version number * Improve Scorer handling of models without textcats * Fixing Scorer handling of models without textcats * Update Scorer output for python 2.7 * Modify inf in Scorer for python 2.7 * Auto-format Also make small adjustments to make auto-formatting with black easier and produce nicer results * Move error message to Errors * Update documentation * Add cats to annotation JSON format [ci skip] * Fix tpl flag and docs [ci skip] * Switch to internal roc_auc_score Switch to internal `roc_auc_score()` adapted from scikit-learn. * Add AUCROCScore tests and improve errors/warnings * Add tests for AUCROCScore and roc_auc_score * Add missing error for only positive/negative values * Remove unnecessary warnings and errors * Make reduced roc_auc_score functions private Because most of the checks and warnings have been stripped for the internal functions and access is only intended through `ROCAUCScore`, make the functions for roc_auc_score adapted from scikit-learn private. * Check that data corresponds with multilabel flag Check that the training instances correspond with the multilabel flag, adding the multilabel flag if required. * Add textcat score to early stopping check * Add more checks to debug-data for textcat * Add example training data for textcat * Add more checks to textcat train CLI * Check configuration when extending base model * Fix typos * Update textcat example data * Provide licensing details and licenses for data * Remove two labels with no positive instances from jigsaw-toxic-comment data. Co-authored-by: Ines Montani <ines@ines.io>		2019-09-15 22:31:31 +02:00
..
__init__.pxd	…
__init__.py	…
_beam_utils.pxd	Export hash_state from beam_utils	2019-03-15 15:20:28 +01:00
_beam_utils.pyx	Use hash_state in beam	2019-03-15 15:22:58 +01:00
_parser_model.pxd	Fix handling of added labels. Resolves #3189	2019-02-24 16:41:41 +01:00
_parser_model.pyx	Normalize over all actions in parser, not just valid ones	2019-03-15 15:22:16 +01:00
_state.pxd	Fix NER when preset entities cross sentence boundaries (#3379 )	2019-03-10 14:53:03 +01:00
_state.pyx	Tidy up syntax	2017-10-27 19:45:57 +02:00
arc_eager.pxd	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
arc_eager.pyx	Add textcat to train CLI (#4226 )	2019-09-15 22:31:31 +02:00
ner.pxd	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
ner.pyx	Add textcat to train CLI (#4226 )	2019-09-15 22:31:31 +02:00
nn_parser.pxd	💫 Better support for semi-supervised learning (#3035 )	2018-12-10 16:25:33 +01:00
nn_parser.pyx	Add textcat to train CLI (#4226 )	2019-09-15 22:31:31 +02:00
nonproj.pxd	…
nonproj.pyx	Merge master into develop. Big merge, many conflicts -- need to review	2018-04-29 14:49:26 +02:00
stateclass.pxd	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
stateclass.pyx	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
transition_system.pxd	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
transition_system.pyx	💫 Make serialization methods consistent (#3385 )	2019-03-10 19:16:45 +01:00