Commit Graph

677 Commits

Author SHA1 Message Date
Ines Montani 758ad6c3cd Make CPU the default for init config 2020-12-09 11:00:51 +11:00
Ines Montani 94a5a9814f Update argument handling and documentation 2020-12-08 20:41:18 +11:00
Sofie Van Landeghem 2c27093c5f
require_cpu functionality (#6336)
* add require_cpu from Thinc 8.0.0rc2

* add docs

* fix test if cupy is not installed
2020-12-08 14:42:40 +08:00
Ines Montani ee2ec52f48
Merge pull request #6409 from svlandeg/feature/trf-docs 2020-12-08 06:32:10 +01:00
Ines Montani 82e88f0e3b
Merge pull request #6379 from svlandeg/fix/labels-constructor 2020-12-08 06:29:56 +01:00
svlandeg 636be3c791 Merge remote-tracking branch 'upstream/develop' into feature/trf-docs 2020-11-19 14:15:35 +01:00
Sofie Van Landeghem 165993d8e5
fix typo in transformer docs (#6404) 2020-11-19 14:11:38 +01:00
svlandeg 73fc1ed963 remove labels from morphologizer constructor 2020-11-11 21:48:50 +01:00
svlandeg fcd79e0655 remove set_morphology from docs 2020-11-11 21:32:34 +01:00
svlandeg 789fb3d124 add docs for upstream argument of TransformerListener 2020-11-09 21:42:58 +01:00
Ines Montani 363ac73c72 Update docs [ci skip] 2020-11-09 12:43:26 +08:00
Sofie Van Landeghem 8ef056cf98
fix embed_size in Entity Linker architecture (#6343) 2020-11-04 22:20:13 +01:00
Adriane Boyd a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
Sofie Van Landeghem 75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Ines Montani 4d99d2b94a Update docs [ci skip] 2020-10-13 11:38:52 +02:00
svlandeg 40276fd3be update NEL docs after latest refactor 2020-10-12 11:41:27 +02:00
Ines Montani e50dc2c1c9 Update docs [ci skip] 2020-10-09 12:04:52 +02:00
Ines Montani 329b61ee7b Update docs [ci skip] 2020-10-09 10:36:06 +02:00
Sofie Van Landeghem d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani 064575d79d
Merge pull request #6216 from svlandeg/feature/nel-initialize 2020-10-08 11:14:12 +02:00
Ines Montani 43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
svlandeg eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
Ines Montani 2fd7122074 Update docs [ci skip] 2020-10-06 10:31:48 +02:00
Ines Montani 568e12215d
Merge pull request #6206 from svlandeg/fix/patterns-init 2020-10-06 10:27:23 +02:00
svlandeg 9b4cf7b0b6 update output of debug config command 2020-10-06 09:47:23 +02:00
svlandeg fd0f60e2bc updates to data format for training and pretraining 2020-10-06 09:28:53 +02:00
svlandeg ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
Ines Montani 1a554bdcb1 Update docs and docstring [ci skip] 2020-10-05 21:55:27 +02:00
Matthew Honnibal 919790cb47 Upd MultiHashEmbed docs 2020-10-05 20:28:21 +02:00
svlandeg 193e0d5a98 add docs for entity_ruler.initialize 2020-10-05 18:04:08 +02:00
svlandeg 65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
Ines Montani 0f64556c04
Merge pull request #6197 from svlandeg/feature/pipe-docs [ci skip] 2020-10-05 11:55:40 +02:00
svlandeg 52b660e9dc initialize and update explanation 2020-10-05 00:39:36 +02:00
Ines Montani 3c36a57e84
Update data augmenters (#6196)
* Draft lower-case augmenter

* Make warning a debug log

* Update lowercase augmenter, docs and tests

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-04 17:46:29 +02:00
Ines Montani 11347f34da Tidy up, tests and docs 2020-10-04 13:54:05 +02:00
Ines Montani 989c59918c Update docs [ci skip] 2020-10-03 18:53:39 +02:00
Ines Montani 7c4ab7e82c Fix Lemmatizer.get_lookups_config 2020-10-03 17:16:10 +02:00
Ines Montani dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani 35d695a031 Update docs 2020-10-03 16:08:24 +02:00
svlandeg 02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
Sofie Van Landeghem 09dcb75076
small UX fix for DocBin (#6167)
* add informative warning when messing up store_user_data DocBin flags

* add informative warning when messing up store_user_data DocBin flags

* cleanup test

* rename to patterns_path
2020-10-02 15:43:32 +02:00
Ines Montani f0b30aedad
Make lemmatizers use initialize logic (#6182)
* Make lemmatizer use initialize logic and tidy up

* Fix typo

* Raise for uninitialized tables
2020-10-02 15:42:36 +02:00
Ines Montani df06f7a792 Update docs [ci skip] 2020-10-02 13:24:33 +02:00
Ines Montani d2aa662ab2
Merge pull request #6179 from adrianeboyd/feature/token-morph-refactor-2 [ci skip] 2020-10-02 12:10:27 +02:00
Ines Montani 32cdc1c4f4 Update docs [ci skip] 2020-10-02 11:38:03 +02:00
Adriane Boyd fd09e6b140 Update docs for Token.morph / Token.set_morph 2020-10-02 09:05:15 +02:00
Ines Montani 01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Ines Montani 6b94cee468 Fix docs [ci skip] 2020-10-02 01:11:19 +02:00
Ines Montani f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
svlandeg 1328c9fd14 consistently use --code instead of --code-path 2020-10-01 16:59:22 +02:00