Commit Graph

311 Commits

Author SHA1 Message Date
Sofie Van Landeghem 75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Matthew Honnibal db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani 44160cd52f Tidy up [ci skip] 2020-10-01 10:41:19 +02:00
Ines Montani a5debb356d Tidy up and adjust logging [ci skip] 2020-09-30 01:22:08 +02:00
Ines Montani 56a2f778c4 Add logging [ci skip] 2020-09-30 01:08:55 +02:00
Ines Montani 0a1ee109db Remove init form path 2020-09-29 22:53:18 +02:00
Ines Montani 0250bcf6a3 Show validation error during init 2020-09-29 22:29:09 +02:00
Matthew Honnibal e70a00fa76 Remove unnecessary warning from train 2020-09-29 16:47:54 +02:00
Matthew Honnibal 3f0d61232d Remove outdated arg from train 2020-09-29 16:47:44 +02:00
Ines Montani a139fe672b Fix typos and refactor CLI logging 2020-09-28 21:17:10 +02:00
Ines Montani 822ea4ef61 Refactor CLI 2020-09-28 15:09:59 +02:00
Ines Montani c22ecc66bb Don't support init path for now 2020-09-28 12:46:28 +02:00
Ines Montani a5f2cc0509 Tidy up and remove raw text (rehearsal) for now 2020-09-28 12:30:13 +02:00
Ines Montani e44a7519cd Update CLI and add [initialize] block 2020-09-28 11:56:14 +02:00
Ines Montani 2fdb7285a0 Update CLI 2020-09-28 11:06:07 +02:00
Ines Montani 553bfea641 Fix commands 2020-09-28 10:53:17 +02:00
Matthew Honnibal b886f53c31 init-pipeline runs (maybe doesnt work) 2020-09-28 03:42:47 +02:00
Matthew Honnibal ed2aff2db3 Remove unused train code 2020-09-28 03:12:31 +02:00
Matthew Honnibal 3a0a3b8db6 Dont hard-code for 'corpora' name 2020-09-28 03:06:33 +02:00
Matthew Honnibal a3e1791c9c Upd train 2020-09-28 01:08:30 +02:00
Matthew Honnibal b5556093e2 Start updating train script 2020-09-27 23:59:44 +02:00
Ines Montani 7e938ed63e Update config resolution to use new Thinc 2020-09-27 22:21:31 +02:00
Matthew Honnibal 39b178999c Tmp notes 2020-09-27 20:13:38 +02:00
Ines Montani b4486d747d Merge branch 'develop' into fix/train-config-interpolation 2020-09-26 15:32:14 +02:00
Ines Montani b2d07de786 Construct nlp from uninterpolated config before training 2020-09-26 15:16:59 +02:00
Ines Montani ca3c997062 Improve CLI config validation with latest Thinc 2020-09-26 13:13:57 +02:00
Sofie Van Landeghem 009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani be56c0994b Add [training.before_to_disk] callback 2020-09-24 12:40:25 +02:00
Ines Montani f69fea8b25 Improve error handling around non-number scores 2020-09-24 11:29:07 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
svlandeg 6435458d51 simplify expression 2020-09-23 12:12:38 +02:00
svlandeg 20b0ec5dcf avoid logging performance of frozen components 2020-09-23 10:37:12 +02:00
Ines Montani e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2 2020-09-19 12:33:38 +02:00
Sofie Van Landeghem 39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
svlandeg 427dbecdd6 cleanup and formatting 2020-09-17 11:48:04 +02:00
svlandeg 0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg 51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Sofie Van Landeghem 3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Matthew Honnibal ba5f4c9b32 Add words and seconds to train info 2020-09-08 15:24:47 +02:00
Ines Montani ab1bb421ed Update docs links in codebase 2020-09-04 12:58:50 +02:00
Ines Montani b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Matthew Honnibal 122cb02001 Fix averages 2020-09-02 19:37:43 +02:00
Matthew Honnibal ec660e3131 Fix use_pytorch_for_gpu_memory 2020-09-01 00:41:38 +02:00
Matthw Honnibal c38298b8fa Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-31 19:55:55 +02:00
Matthw Honnibal fe298fa50a Shuffle on first epoch of train 2020-08-31 19:55:22 +02:00
svlandeg 13ee742fb4 example of custom logger 2020-08-31 14:24:41 +02:00
svlandeg 5230529de2 add loggers registry & logger docs sections 2020-08-28 21:44:04 +02:00
Ines Montani a5fff1df51 Remove outdated non-empty output dir warning [ci skip] 2020-08-26 15:45:51 +02:00