Commit Graph

74 Commits

Author SHA1 Message Date
Sofie Van Landeghem 1e974de837
config is not Optional () 2021-08-27 11:44:31 +02:00
Paul O'Leary McCann 0c553ecd4e Fix docs (fix ) 2021-05-24 19:47:30 +09:00
Adriane Boyd 9fd41d6742 Remove Language.pipe cleanup arg 2021-03-18 13:31:42 +01:00
graue70 0fddc0447c
Fix copy & paste error in API docs 2021-03-02 14:00:14 +01:00
Ines Montani 95e958a229
Merge pull request from explosion/feature/replace-listeners 2021-01-30 00:58:08 +11:00
Ines Montani e766e8c56d
Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-29 21:41:17 +11:00
svlandeg d7d838281c adding new="3" mentions in the doc 2021-01-29 11:26:37 +01:00
Ines Montani 99af9e7125 Update documentation 2021-01-29 18:45:48 +11:00
Sofie Van Landeghem 837a4f53c2
Error handling in nlp.pipe ()
* add error handler for pipe methods

* add unit tests

* remove pipe method that are the same as their base class

* have Language keep track of a default error handler

* cleanup

* formatting

* small refactor

* add documentation
2021-01-29 08:51:21 +08:00
Adriane Boyd 80ac8af1bf Format 2020-12-09 12:44:01 +01:00
Adriane Boyd 795b5bd049
Update website/docs/api/language.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-12-09 12:23:32 +01:00
Adriane Boyd fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani 363ac73c72 Update docs [ci skip] 2020-11-09 12:43:26 +08:00
svlandeg eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
Ines Montani df06f7a792 Update docs [ci skip] 2020-10-02 13:24:33 +02:00
Ines Montani f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Ines Montani d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani d7469283c5 Update docs [ci skip] 2020-09-29 16:59:21 +02:00
Ines Montani ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani b92c8aae78 Merge branch 'develop' into pr/6135 2020-09-24 13:44:56 +02:00
walterhenry 3dd5f409ec Proofreading
Proofread some API docs
2020-09-24 13:15:28 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani f9af7d365c Update docs [ci skip] 2020-09-22 09:45:41 +02:00
Ines Montani 0edd695bf6 Update docs 2020-09-15 11:41:49 +02:00
Ines Montani 99549a5ace Fix consistency and update docs 2020-09-15 11:37:37 +02:00
Ines Montani 8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
svlandeg 9073d99fc9 fix link to shape inference section 2020-09-10 10:22:59 +02:00
svlandeg a8aa9a8068 document Pipe API details, crossreferences etc 2020-09-09 15:56:27 +02:00
Ines Montani 25a595dc10 Fix typos and wording [ci skip] 2020-09-03 16:37:45 +02:00
Ines Montani b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Ines Montani 66d76f5126 Update docs 2020-08-29 12:36:05 +02:00
Ines Montani 82f0e20318 Update docs and consistency [ci skip] 2020-08-18 14:39:40 +02:00
Ines Montani 1c3bcfb488 Update docs and util consistency 2020-08-18 01:22:59 +02:00
Ines Montani 3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Ines Montani b7ec06e331 Update docs [ci skip] 2020-08-11 20:57:23 +02:00
Ines Montani c044460823 Update docs [ci skip] 2020-08-10 00:01:38 +02:00
Ines Montani cdec46493f Update docs 2020-08-05 15:00:54 +02:00
Ines Montani b40f44419b Simplify pipe analysis
- remove unused code
- don't print by default
- integrate attrs info into analysis output
2020-08-01 13:40:06 +02:00
Ines Montani 98c6a85c8b Update docs [ci skip] 2020-07-31 18:55:38 +02:00
Ines Montani e9e8fa2466 Update docs and types 2020-07-31 17:02:54 +02:00
Adriane Boyd 9b509aa87f Move Language.evaluate scorer config to new arg
Move `Language.evaluate` scorer config from `component_cfg` to separate
argument `scorer_cfg`.
2020-07-31 11:05:16 +02:00
Ines Montani b0f57a0cac Update docs and consistency 2020-07-29 15:14:07 +02:00
Ines Montani e0ffe36e79 Update docstrings, docs and types 2020-07-29 11:36:42 +02:00
Ines Montani ae4d8a6ffd Update docstrings, docs and pipe consistency 2020-07-28 13:37:31 +02:00
Ines Montani 0094cb0d04 Remove scores list from config and document 2020-07-28 11:22:24 +02:00
Ines Montani d8b519c23c API docs, docstrings and argument consistency 2020-07-27 18:11:45 +02:00
Ines Montani 7adbaf9a5b Update docs [ci skip] 2020-07-27 00:29:45 +02:00
Ines Montani c288dba8e7 Update docs [ci skip] 2020-07-25 18:51:12 +02:00
Adriane Boyd 2bcceb80c4
Refactor the Scorer to improve flexibility ()
* Refactor the Scorer to improve flexibility

Refactor the `Scorer` to improve flexibility for arbitrary pipeline
components.

* Individual pipeline components provide their own `evaluate` methods
that score a list of `Example`s and return a dictionary of scores
* `Scorer` is initialized either:
  * with a provided pipeline containing components to be scored
  * with a default pipeline containing the built-in statistical
    components (senter, tagger, morphologizer, parser, ner)
* `Scorer.score` evaluates a list of `Example`s and returns a dictionary
of scores referring to the scores provided by the components in the
pipeline

Significant differences:

* `tags_acc` is renamed to `tag_acc` to be consistent with `token_acc`
and the new `morph_acc`, `pos_acc`, and `lemma_acc`
* Scoring is no longer cumulative: `Scorer.score` scores a list of
examples rather than a single example and does not retain any state
about previously scored examples
* PRF values in the returned scores are no longer multiplied by 100

* Add kwargs to Morphologizer.evaluate

* Create generalized scoring methods in Scorer

* Generalized static scoring methods are added to `Scorer`
  * Methods require an attribute (either on Token or Doc) that is
used to key the returned scores

Naming differences:

* `uas`, `las`, and `las_per_type` in the scores dict are renamed to
`dep_uas`, `dep_las`, and `dep_las_per_type`

Scoring differences:

* `Doc.sents` is now scored as spans rather than on sentence-initial
token positions so that `Doc.sents` and `Doc.ents` can be scored with
the same method (this lowers scores since a single incorrect sentence
start results in two incorrect spans)

* Simplify / extend hasattr check for eval method

* Add hasattr check to tokenizer scoring
* Simplify to hasattr check for component scoring

* Reset Example alignment if docs are set

Reset the Example alignment if either doc is set in case the
tokenization has changed.

* Add PRF tokenization scoring for tokens as spans

Add PRF scores for tokens as character spans. The scores are:

* token_acc: # correct tokens / # gold tokens
* token_p/r/f: PRF for (token.idx, token.idx + len(token))

* Add docstring to Scorer.score_tokenization

* Rename component.evaluate() to component.score()

* Update Scorer API docs

* Update scoring for positive_label in textcat

* Fix TextCategorizer.score kwargs

* Update Language.evaluate docs

* Update score names in default config
2020-07-25 12:53:02 +02:00
svlandeg c94279ac1b remove tensors, fix predict, get_loss and set_annotations 2020-07-08 13:11:54 +02:00