Commit Graph

310 Commits

Author SHA1 Message Date
Xiaoquan Kong f0c9652ed1 New Feature: display more detail when Error E067 (#2639)
* Fix off-by-one error

* Add verbose option

* Update verbose option

* Update documents for verbose option
2018-08-07 10:45:29 +02:00
Matthew Honnibal c83fccfe2a Fix output of best model 2018-06-25 23:05:56 +02:00
Matthew Honnibal c4698f5712 Don't collate model unless training succeeds 2018-06-25 16:36:42 +02:00
Matthew Honnibal 24dfbb8a28 Fix model collation 2018-06-25 14:35:24 +02:00
Matthew Honnibal 62237755a4 Import shutil 2018-06-25 13:40:17 +02:00
Matthew Honnibal a040fca99e Import json into cli.train 2018-06-25 11:50:37 +02:00
Matthew Honnibal 2c703d99c2 Fix collation of best models 2018-06-25 01:21:34 +02:00
Matthew Honnibal 2c80b7c013 Collate best model after training 2018-06-24 23:39:52 +02:00
ines 330c039106 Merge branch 'master' into develop 2018-05-26 18:30:52 +02:00
James Messinger 4515e96e90 Better formatting for `spacy train` CLI (#2357)
* Better formatting for `spacy train` CLI

Changed to use fixed-spaces rather than tabs to align table headers and data.

### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```

### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```

* Added contributor file
2018-05-25 13:08:45 +02:00
Matthew Honnibal 2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani 3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal 17c3e7efa2 Add message noting vectors 2018-03-28 16:33:43 +02:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal 86405e4ad1 Fix CLI for multitask objectives 2018-02-18 10:59:11 +01:00
Matthew Honnibal a34749b2bf Add multitask objectives options to train CLI 2018-02-17 22:03:54 +01:00
Matthew Honnibal 262d0a3148 Fix overwriting of lexical attributes when loading vectors during training 2018-02-17 18:11:11 +01:00
Johannes Dollinger bf94c13382 Don't fix random seeds on import 2018-02-13 12:42:23 +01:00
Søren Lind Kristiansen 7f0ab145e9 Don't pass CLI command name as dummy argument 2018-01-04 21:33:47 +01:00
Søren Lind Kristiansen a9ff6eadc9 Prefix dummy argument names with underscore 2018-01-03 20:48:12 +01:00
Isaac Sijaranamual 20ae0c459a Fixes "Error saving model" #1622 2017-12-10 23:07:13 +01:00
Isaac Sijaranamual e188b61960 Make cli/train.py not eat exception 2017-12-10 22:53:08 +01:00
Matthew Honnibal c2bbf076a4 Add document length cap for training 2017-11-03 01:54:54 +01:00
ines 37e62ab0e2 Update vector meta in meta.json 2017-11-01 01:25:09 +01:00
Matthew Honnibal 3659a807b0 Remove vector pruning arg from train CLI 2017-10-31 19:21:05 +01:00
Matthew Honnibal e98451b5f7 Add -prune-vectors argument to spacy.cly.train 2017-10-30 18:00:10 +01:00
ines d941fc3667 Tidy up CLI 2017-10-27 14:38:39 +02:00
ines 11e3f19764 Fix vectors data added after training (see #1457) 2017-10-25 16:08:26 +02:00
ines 273e638183 Add vector data to model meta after training (see #1457) 2017-10-25 16:03:05 +02:00
Matthew Honnibal a955843684 Increase default number of epochs 2017-10-12 13:13:01 +02:00
Matthew Honnibal acba2e1051 Fix metadata in training 2017-10-11 08:55:52 +02:00
Matthew Honnibal 74c2c6a58c Add default name and lang to meta 2017-10-11 08:49:12 +02:00
Matthew Honnibal 5156074df1 Make loading code more consistent in train command 2017-10-10 12:51:20 -05:00
Matthew Honnibal 97c9b5db8b Patch spacy.train for new pipeline management 2017-10-09 23:41:16 -05:00
Matthew Honnibal 808d8740d6 Remove print statement 2017-10-09 08:45:20 -05:00
Matthew Honnibal 0f41b25f60 Add speed benchmarks to metadata 2017-10-09 08:05:37 -05:00
Matthew Honnibal be4f0b6460 Update defaults 2017-10-08 02:08:12 -05:00
Matthew Honnibal 9d66a915da Update training defaults 2017-10-07 21:02:38 -05:00
Matthew Honnibal c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal 5743b06e36 Wrap model saving in try/except 2017-10-05 08:12:50 -05:00
Matthew Honnibal 8902df44de Fix component disabling during training 2017-10-02 21:07:23 +02:00
Matthew Honnibal c617d288d8 Update pipeline component names in spaCy train 2017-10-02 17:20:19 +02:00
Matthew Honnibal ac8481a7b0 Print NER loss 2017-09-28 08:05:31 -05:00
Matthew Honnibal 542ebfa498 Improve defaults 2017-09-27 18:54:37 -05:00
Matthew Honnibal dcb86bdc43 Default batch size to 32 2017-09-27 11:48:19 -05:00
ines 1ff62eaee7 Fix option shortcut to avoid conflict 2017-09-26 17:59:34 +02:00
ines 7fdfb78141 Add version option to cli.train 2017-09-26 17:34:52 +02:00
Matthew Honnibal 698fc0d016 Remove merge artefact 2017-09-26 08:31:37 -05:00
Matthew Honnibal defb68e94f Update feature/noshare with recent develop changes 2017-09-26 08:15:14 -05:00
ines edf7e4881d Add meta.json option to cli.train and add relevant properties
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
Matthew Honnibal 204b58c864 Fix evaluation during training 2017-09-24 05:01:03 -05:00
Matthew Honnibal dc3a623d00 Remove unused update_shared argument 2017-09-24 05:00:37 -05:00
Matthew Honnibal 4348c479fc Merge pre-trained vectors and noshare patches 2017-09-22 20:07:28 -05:00
Matthew Honnibal e93d43a43a Fix training with preset vectors 2017-09-22 20:00:40 -05:00
Matthew Honnibal a2357cce3f Set random seed in train script 2017-09-23 02:57:31 +02:00
Matthew Honnibal 0a9016cade Fix serialization during training 2017-09-21 13:06:45 -05:00
Matthew Honnibal 20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal 1d73dec8b1 Refactor train script 2017-09-20 19:17:10 -05:00
Matthew Honnibal a0c4b33d03 Support resuming a model during spacy train 2017-09-18 18:04:47 -05:00
Matthew Honnibal 8496d76224 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-14 09:21:20 -05:00
Matthew Honnibal 24ff6b0ad9 Fix parsing and tok2vec models 2017-09-06 05:50:58 -05:00
Matthew Honnibal e920885676 Fix pickle during train 2017-09-02 12:46:01 -05:00
Matthew Honnibal 7a6edeea68 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 12:55:39 -05:00
Matthew Honnibal f2f9229964 Fix name of update_shared flag 2017-08-20 18:19:06 +02:00
Matthew Honnibal 84bb543e4d Add gold_preproc flag to cli/train 2017-08-20 11:07:00 -05:00
Matthew Honnibal 11c31d285c Restore changes from nn-beam-parser 2017-08-18 22:26:12 +02:00
Matthew Honnibal 52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal 8870d491f1 Remove redundant pickling during training 2017-08-12 08:55:53 -05:00
Matthew Honnibal 0a566dc320 Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:18:12 +02:00
Matthew Honnibal c52fde40f4 Improve train CLI 2017-06-04 20:18:37 -05:00
Matthew Honnibal 21eef90dbc Support specifying which GPU 2017-06-03 16:10:23 -05:00
Matthew Honnibal 43353b5413 Improve train CLI script 2017-06-03 13:28:20 -05:00
Matthew Honnibal 8a693c2605 Write binary file during training 2017-05-31 02:59:18 +02:00
Matthew Honnibal 49235017bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 16:34:28 -05:00
Matthew Honnibal 5e4312feed Evaluate loaded class, to ensure save/load works 2017-05-27 15:47:02 -05:00
ines 086a06e7d7 Fix CLI docstrings and add command as first argument
Workaround for Plac
2017-05-27 20:01:46 +02:00
Matthew Honnibal de13fe0305 Remove length cap on sentences 2017-05-27 08:20:32 -05:00
Matthew Honnibal d65f99a720 Improve model saving in train script 2017-05-26 05:52:09 -05:00
Matthew Honnibal df8015f05d Tweaks to train script 2017-05-25 17:15:24 -05:00
Matthew Honnibal 702fe74a4d Clean up spacy.cli.train 2017-05-25 16:16:30 -05:00
Matthew Honnibal 135a13790c Disable gold preprocessing 2017-05-24 20:10:20 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 6e8dce2c05 Fix train command line args 2017-05-22 10:41:39 -05:00
Matthew Honnibal ae8cf70dc1 Fix CLI train signature 2017-05-22 06:13:39 -05:00
ines fc3ec733ea Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
Matthew Honnibal bc2294d7f1 Add support for fiddly hyper-parameters to train func 2017-05-22 04:51:08 -05:00
Matthew Honnibal 4e0988605a Pass through non-projective=True 2017-05-22 04:51:08 -05:00
Matthew Honnibal e14533757b Use averaged params for evaluation 2017-05-22 04:51:08 -05:00
Matthew Honnibal 4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
Matthew Honnibal 3376d4d6e8 Update the train script, fixing GPU memory leak 2017-05-19 18:15:50 -05:00
Matthew Honnibal ca70b08661 Fix GPU training and evaluation 2017-05-18 08:30:33 -05:00
Matthew Honnibal fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal 5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines 59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
Matthew Honnibal 4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
ines 3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
Matthew Honnibal 2efdbc08ff Make training work with directories 2017-03-26 08:46:44 -05:00
Matthew Honnibal 9dcb58aaaf Merge CLI changes 2017-03-26 07:30:45 -05:00
Matthew Honnibal 6b7f7a2060 Connect parser L1 option to train CLI 2017-03-26 07:24:07 -05:00
Matthew Honnibal dec5571bf3 Update train CLI 2017-03-26 07:16:52 -05:00
ines 53cf2f1c0e Make dev data optional 2017-03-26 11:48:17 +02:00
ines 0035fd9efe Add spacy train work in progress 2017-03-23 11:08:41 +01:00