Commit Graph

404 Commits

Author SHA1 Message Date
Ines Montani 4c055f0aa7
Add init CLI and init config (#5854)
* Add init CLI and init config draft

* Improve config validation

* Auto-format

* Don't export anything in debug config

* Update docs
2020-08-02 15:18:30 +02:00
Ines Montani b40f44419b Simplify pipe analysis
- remove unused code
- don't print by default
- integrate attrs info into analysis output
2020-08-01 13:40:06 +02:00
Ines Montani 98c6a85c8b Update docs [ci skip] 2020-07-31 18:55:38 +02:00
Ines Montani e9e8fa2466 Update docs and types 2020-07-31 17:02:54 +02:00
Ines Montani 5a221f79c2 Revert "Remove keyword-only from Scorer API docs" [ci skip]
This reverts commit 7a6ac47dc1.
2020-07-31 14:00:21 +02:00
Adriane Boyd 9b509aa87f Move Language.evaluate scorer config to new arg
Move `Language.evaluate` scorer config from `component_cfg` to separate
argument `scorer_cfg`.
2020-07-31 11:05:16 +02:00
Adriane Boyd 9d79916792 Merge branch 'develop' into feature/scorer-adjustments 2020-07-31 10:48:14 +02:00
Ines Montani 9c80cb673d Update docs [ci skip] 2020-07-29 19:41:34 +02:00
Ines Montani 9f69afdd1e Update docs [ci skip] 2020-07-29 19:09:44 +02:00
Ines Montani 6a5c853edb Fix docs [ci skip] 2020-07-29 18:45:12 +02:00
Ines Montani 158d8c1e48 Update docs [ci skip] 2020-07-29 18:44:10 +02:00
Ines Montani b0f57a0cac Update docs and consistency 2020-07-29 15:14:07 +02:00
Ines Montani e0ffe36e79 Update docstrings, docs and types 2020-07-29 11:36:42 +02:00
Adriane Boyd 7a6ac47dc1 Remove keyword-only from Scorer API docs 2020-07-29 10:40:30 +02:00
Ines Montani ac24adec73 Small adjustments to Scorer and docs 2020-07-28 21:39:42 +02:00
Ines Montani 256b24b720 Update arch docs WIP [ci skip] 2020-07-28 20:33:52 +02:00
Ines Montani ae4d8a6ffd Update docstrings, docs and pipe consistency 2020-07-28 13:37:31 +02:00
Ines Montani 0094cb0d04 Remove scores list from config and document 2020-07-28 11:22:24 +02:00
Ines Montani 894e20c466 Merge branch 'develop' into feature/component-scores 2020-07-27 18:14:39 +02:00
Ines Montani d8b519c23c API docs, docstrings and argument consistency 2020-07-27 18:11:45 +02:00
Ines Montani 10b84e1e27 Add flag to toggle sdist creation on package [ci skip] 2020-07-27 16:52:23 +02:00
Adriane Boyd fdf09cb231 Update Scorer API docs for score_cats 2020-07-27 15:34:42 +02:00
Ines Montani 7adbaf9a5b Update docs [ci skip] 2020-07-27 00:29:45 +02:00
Ines Montani c288dba8e7 Update docs [ci skip] 2020-07-25 18:51:12 +02:00
Ines Montani eb9acae34d
Merge pull request #5791 from adrianeboyd/docs/morphology 2020-07-25 15:10:21 +02:00
Adriane Boyd 2bcceb80c4
Refactor the Scorer to improve flexibility (#5731)
* Refactor the Scorer to improve flexibility

Refactor the `Scorer` to improve flexibility for arbitrary pipeline
components.

* Individual pipeline components provide their own `evaluate` methods
that score a list of `Example`s and return a dictionary of scores
* `Scorer` is initialized either:
  * with a provided pipeline containing components to be scored
  * with a default pipeline containing the built-in statistical
    components (senter, tagger, morphologizer, parser, ner)
* `Scorer.score` evaluates a list of `Example`s and returns a dictionary
of scores referring to the scores provided by the components in the
pipeline

Significant differences:

* `tags_acc` is renamed to `tag_acc` to be consistent with `token_acc`
and the new `morph_acc`, `pos_acc`, and `lemma_acc`
* Scoring is no longer cumulative: `Scorer.score` scores a list of
examples rather than a single example and does not retain any state
about previously scored examples
* PRF values in the returned scores are no longer multiplied by 100

* Add kwargs to Morphologizer.evaluate

* Create generalized scoring methods in Scorer

* Generalized static scoring methods are added to `Scorer`
  * Methods require an attribute (either on Token or Doc) that is
used to key the returned scores

Naming differences:

* `uas`, `las`, and `las_per_type` in the scores dict are renamed to
`dep_uas`, `dep_las`, and `dep_las_per_type`

Scoring differences:

* `Doc.sents` is now scored as spans rather than on sentence-initial
token positions so that `Doc.sents` and `Doc.ents` can be scored with
the same method (this lowers scores since a single incorrect sentence
start results in two incorrect spans)

* Simplify / extend hasattr check for eval method

* Add hasattr check to tokenizer scoring
* Simplify to hasattr check for component scoring

* Reset Example alignment if docs are set

Reset the Example alignment if either doc is set in case the
tokenization has changed.

* Add PRF tokenization scoring for tokens as spans

Add PRF scores for tokens as character spans. The scores are:

* token_acc: # correct tokens / # gold tokens
* token_p/r/f: PRF for (token.idx, token.idx + len(token))

* Add docstring to Scorer.score_tokenization

* Rename component.evaluate() to component.score()

* Update Scorer API docs

* Update scoring for positive_label in textcat

* Fix TextCategorizer.score kwargs

* Update Language.evaluate docs

* Update score names in default config
2020-07-25 12:53:02 +02:00
Adriane Boyd 8f44584bef Update MorphAnalysis.get and related examples 2020-07-23 08:51:31 +02:00
Adriane Boyd 941b9e33f7 Add Token.morph_ 2020-07-22 17:59:45 +02:00
Adriane Boyd fcd3a4abe3 Add morph to Token API docs 2020-07-21 13:05:58 +02:00
Adriane Boyd 14df00ae98 Add Morphology and MorphAnalsysis API docs
Add initial draft of `Morphology` and `MorphAnalysis` API docs.
2020-07-21 10:33:46 +02:00
Adriane Boyd 986f7e4d69 Initial draft of Morphologizer API docs 2020-07-20 12:53:02 +02:00
Ines Montani 872938ec76
Merge pull request #5747 from explosion/feature/refactor-config-args 2020-07-14 00:00:22 +02:00
Ines Montani c96535e338 Update command docstrings and docs 2020-07-12 13:53:49 +02:00
Ines Montani 11bbc82c24 Update cli.md [ci skip] 2020-07-10 23:37:52 +02:00
Ines Montani 9455b060d2 Update cli.md 2020-07-10 22:57:22 +02:00
Ines Montani ea01831f6a Update projects docs etc. 2020-07-09 19:43:25 +02:00
Ines Montani 9ee5b71412 Update cli.md 2020-07-09 11:44:00 +02:00
Ines Montani 9ae4040183 Update API docs 2020-07-08 13:34:35 +02:00
svlandeg c94279ac1b remove tensors, fix predict, get_loss and set_annotations 2020-07-08 13:11:54 +02:00
svlandeg 90b100c39f remove component.Model, update constructor, losses is return value of update 2020-07-08 12:14:30 +02:00
Ines Montani 2298e129e6 Update example and training docs 2020-07-07 20:30:12 +02:00
svlandeg 2b60e894cb fix component constructors, update, begin_training, reference to GoldParse 2020-07-07 19:17:19 +02:00
svlandeg 14a796e3f9 add Example API with examples of Example usage 2020-07-07 14:46:41 +02:00
Ines Montani bb3ee38cf9 Update WIP 2020-07-06 22:22:37 +02:00
Ines Montani 44da24ddd0 Update doc.md 2020-07-06 18:17:00 +02:00
Ines Montani 44790c1c32 Update docs and add keyword-only tag 2020-07-06 18:14:57 +02:00
Ines Montani 63247cbe87 Update v3 docs [ci skip] 2020-07-05 16:11:16 +02:00
Ines Montani dc8c9d912f Update docs [ci skip] 2020-07-04 16:47:24 +02:00
Ines Montani 4498dfe99d Update docs 2020-07-04 16:25:30 +02:00
Ines Montani 1e0d54edd1 Update docs 2020-07-04 14:23:10 +02:00