Commit Graph

274 Commits

Author SHA1 Message Date
ines 73ac0aa0b5 Update spacy evaluate and add displaCy option 2017-10-04 00:03:15 +02:00
Matthew Honnibal f24c2e3a8a Fix evaluate for non-GPU 2017-10-03 22:47:31 +02:00
Matthew Honnibal 1289187279 Fix circular import 2017-10-03 09:33:21 -05:00
Matthew Honnibal a44c4c3a5b Add timer to evaluate 2017-10-03 09:15:35 -05:00
Matthew Honnibal 8902df44de Fix component disabling during training 2017-10-02 21:07:23 +02:00
Matthew Honnibal c617d288d8 Update pipeline component names in spaCy train 2017-10-02 17:20:19 +02:00
Matthew Honnibal f942903429 Improve sentence merging in iob2json 2017-10-02 17:02:10 +02:00
Matthew Honnibal 31681d20e0 Fix concatenation in iob2json converter 2017-10-02 16:50:26 +02:00
Matthew Honnibal 4896ce3320 Remove misleading comment 2017-10-02 00:09:14 +02:00
Matthew Honnibal 94df115a81 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-01 14:06:23 -05:00
Matthew Honnibal 69c7c642c2 Add spacy evaluate 2017-10-01 14:05:04 -05:00
ines fd1a9225d8 Handle conversion of pipeline components correctly
Allow both comma and comma + whitespace as separators
2017-09-29 20:52:56 +02:00
Matthew Honnibal ac8481a7b0 Print NER loss 2017-09-28 08:05:31 -05:00
Matthew Honnibal 542ebfa498 Improve defaults 2017-09-27 18:54:37 -05:00
Matthew Honnibal dcb86bdc43 Default batch size to 32 2017-09-27 11:48:19 -05:00
ines 1ff62eaee7 Fix option shortcut to avoid conflict 2017-09-26 17:59:34 +02:00
ines 7fdfb78141 Add version option to cli.train 2017-09-26 17:34:52 +02:00
Matthew Honnibal 698fc0d016 Remove merge artefact 2017-09-26 08:31:37 -05:00
Matthew Honnibal defb68e94f Update feature/noshare with recent develop changes 2017-09-26 08:15:14 -05:00
ines edf7e4881d Add meta.json option to cli.train and add relevant properties
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
Matthew Honnibal 204b58c864 Fix evaluation during training 2017-09-24 05:01:03 -05:00
Matthew Honnibal dc3a623d00 Remove unused update_shared argument 2017-09-24 05:00:37 -05:00
Matthew Honnibal 4348c479fc Merge pre-trained vectors and noshare patches 2017-09-22 20:07:28 -05:00
Matthew Honnibal e93d43a43a Fix training with preset vectors 2017-09-22 20:00:40 -05:00
Matthew Honnibal a2357cce3f Set random seed in train script 2017-09-23 02:57:31 +02:00
Matthew Honnibal 0a9016cade Fix serialization during training 2017-09-21 13:06:45 -05:00
Matthew Honnibal 20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal 1d73dec8b1 Refactor train script 2017-09-20 19:17:10 -05:00
Matthew Honnibal a0c4b33d03 Support resuming a model during spacy train 2017-09-18 18:04:47 -05:00
Matthew Honnibal 8496d76224 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-14 09:21:20 -05:00
Matthew Honnibal 24ff6b0ad9 Fix parsing and tok2vec models 2017-09-06 05:50:58 -05:00
Matthew Honnibal e920885676 Fix pickle during train 2017-09-02 12:46:01 -05:00
ines 7e04b7f89c Fix info text on pipeline in package cli 2017-08-26 18:30:59 +02:00
Matthew Honnibal 876f38c548 Merge pull request #1279 from oroszgy/model_cli_v2
Added vector loading to model cli
2017-08-26 15:57:50 +02:00
ines bb1abbeba5 Only link model if download was successfull 2017-08-23 12:36:31 +02:00
Matthew Honnibal 7be5f30f17 Add profile function 2017-08-21 23:22:49 +02:00
Gyorgy Orosz b3576bfc86 Added vector leading to model cli 2017-08-20 23:16:12 +02:00
Matthew Honnibal 7a6edeea68 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 12:55:39 -05:00
Matthew Honnibal f2f9229964 Fix name of update_shared flag 2017-08-20 18:19:06 +02:00
Matthew Honnibal 80a5146ec2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 11:07:08 -05:00
Matthew Honnibal 84bb543e4d Add gold_preproc flag to cli/train 2017-08-20 11:07:00 -05:00
Gyorgy Orosz e5344b83a3 Ported model cli from v1 2017-08-19 21:45:23 +02:00
Matthew Honnibal 11c31d285c Restore changes from nn-beam-parser 2017-08-18 22:26:12 +02:00
Matthew Honnibal 52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal 4ae0d5e1e6 Set defaults for convert command 2017-08-13 09:03:38 +02:00
ines d4f2baf7dd Add create_meta option to package command
Re-create meta.json in model directory, even if it exists. Especially
useful when updating existing spaCy models or training with Prodigy.
Ensures user won't end up with multiple "en_core_web_sm" models, and
offers easy way to change the model's name and settings without having
to edit the meta.json file.
2017-08-12 21:44:18 +02:00
Matthew Honnibal 8870d491f1 Remove redundant pickling during training 2017-08-12 08:55:53 -05:00
ines 28e2fec23b Fix autolinking failure on fresh model install (resolves #1138)
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Matthew Honnibal 0a566dc320 Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:18:12 +02:00
György Orosz 62dbf9025c Fixed conllu converter 2017-06-09 22:53:56 +02:00
ines 03db56f48c Detect spaCy version and add package title
Package title allows customised package names (like spacy-nightly)
2017-06-05 20:11:02 +02:00
Matthew Honnibal c52fde40f4 Improve train CLI 2017-06-04 20:18:37 -05:00
ines 848e47669e Fix typo 2017-06-04 20:44:15 +02:00
ines 7b7d46b64e Fix typo and success message 2017-06-04 13:45:50 +02:00
Matthew Honnibal 21eef90dbc Support specifying which GPU 2017-06-03 16:10:23 -05:00
Matthew Honnibal 43353b5413 Improve train CLI script 2017-06-03 13:28:20 -05:00
ines e5ae6ccf4e Fix typo 2017-06-01 16:46:15 +02:00
Matthew Honnibal 8a693c2605 Write binary file during training 2017-05-31 02:59:18 +02:00
ines 9e83a17e95 Use new model templates 2017-05-29 15:27:24 +02:00
Matthew Honnibal 8a24c60c1e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 08:12:05 -05:00
Matthew Honnibal 5cf47b847b Handle iob with no tag in converter 2017-05-28 08:11:39 -05:00
ines c1983621fb Update util functions for model loading 2017-05-28 00:22:40 +02:00
Matthew Honnibal 49235017bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 16:34:28 -05:00
Matthew Honnibal 5e4312feed Evaluate loaded class, to ensure save/load works 2017-05-27 15:47:02 -05:00
Matthew Honnibal 7cc9c3e9a6 Fix convert CLI 2017-05-27 15:44:42 -05:00
ines 1203959625 Add pipeline setting to meta.json generator 2017-05-27 20:02:01 +02:00
ines 086a06e7d7 Fix CLI docstrings and add command as first argument
Workaround for Plac
2017-05-27 20:01:46 +02:00
Matthew Honnibal dc07d72d80 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 08:20:40 -05:00
Matthew Honnibal de13fe0305 Remove length cap on sentences 2017-05-27 08:20:32 -05:00
Matthew Honnibal d06f235fc9 Fix conflict on convert.py 2017-05-26 11:33:29 -05:00
Matthew Honnibal 2b3b937a04 Fix converter CLI 2017-05-26 11:32:41 -05:00
Matthew Honnibal 5a87bcf35f Fix converters 2017-05-26 11:32:34 -05:00
Matthew Honnibal d65f99a720 Improve model saving in train script 2017-05-26 05:52:09 -05:00
Matthew Honnibal 22d7b448a5 Fix convert command 2017-05-25 19:47:12 -05:00
Matthew Honnibal df8015f05d Tweaks to train script 2017-05-25 17:15:24 -05:00
Matthew Honnibal 702fe74a4d Clean up spacy.cli.train 2017-05-25 16:16:30 -05:00
Matthew Honnibal 135a13790c Disable gold preprocessing 2017-05-24 20:10:20 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 6e8dce2c05 Fix train command line args 2017-05-22 10:41:39 -05:00
Matthew Honnibal ae8cf70dc1 Fix CLI train signature 2017-05-22 06:13:39 -05:00
ines fc3ec733ea Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
Matthew Honnibal bc2294d7f1 Add support for fiddly hyper-parameters to train func 2017-05-22 04:51:08 -05:00
Matthew Honnibal 4e0988605a Pass through non-projective=True 2017-05-22 04:51:08 -05:00
Matthew Honnibal e14533757b Use averaged params for evaluation 2017-05-22 04:51:08 -05:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
Matthew Honnibal baf3ef0ddc Remove import of removed train_config script 2017-05-21 09:07:34 -05:00
Matthew Honnibal 4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
ines 0c6c65aa3c Improve messaging if model linking fails after download 2017-05-21 00:28:37 +02:00
ines e39ad78267 Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
2017-05-20 12:24:40 +02:00
Matthew Honnibal 3376d4d6e8 Update the train script, fixing GPU memory leak 2017-05-19 18:15:50 -05:00
Matthew Honnibal 08766240c3 Add incomplete iob converter 2017-05-19 13:27:51 -05:00
Matthew Honnibal 09a877886b WIP on iob converter 2017-05-19 13:24:39 -05:00
Matthew Honnibal ca70b08661 Fix GPU training and evaluation 2017-05-18 08:30:33 -05:00
Matthew Honnibal fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal 55dab77de8 Add conversion rule for .conll 2017-05-17 13:13:48 +02:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 3bf4a28d8d Use tag in CoNLL converter, not POS 2017-05-17 12:04:33 +02:00
Matthew Honnibal 8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal 5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines 9d85cda8e4 Fix models error message and use about.__docs_models__ (see #1051) 2017-05-13 13:05:47 +02:00
ines 4eefb288e3 Port over PR #1055 2017-05-13 03:25:32 +02:00
ines 95edd9e896 Let parse_package_meta take full path 2017-05-08 15:30:48 +02:00
ines 59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
ines 527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Matthew Honnibal 4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
ines 3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
ines 25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Gyorgy Orosz 4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
ines 48da244058 Use spacy.compat.json_dumps for Python 2/3 compatibility (resolves #991) 2017-04-19 11:50:36 +02:00
ines 82f5f1f98f Replace str with compat.unicode_ 2017-04-17 01:29:54 +02:00
Matthew Honnibal 17c9fffb9e Fix naked except 2017-04-16 15:28:16 -05:00
ines 6145b7c153 Remove redundant Path 2017-04-16 20:53:25 +02:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
ines 8191e33cf1 Update link error message with info on permissions 2017-04-16 13:32:31 +02:00
ines a3ddbc0444 Add note about --force flag to error message 2017-04-16 13:14:36 +02:00
ines e3de035814 Add meta validation to check for required settings
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines a7574b7572 Add more options to read in meta data in package command
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines 13c8a42d2b Fix typos 2017-04-16 13:03:58 +02:00
ines 35fb4febe2 Fix whitespace 2017-04-15 12:13:45 +02:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
ines 84341c2975 Only compile list of models if data_path exists 2017-04-14 16:48:02 +02:00
Gyorgy Orosz dd3244c08a Made json dump to produce unicode strings in py2 2017-04-13 23:30:47 +02:00
Gyorgy Orosz a9469c8173 Fixed typo 2017-04-13 15:24:14 +02:00
ines 41037f0f07 Remove unused imports 2017-04-13 13:52:11 +02:00
ines 1b92c8d5d5 Use unicode paths on Windows/Python 2 and catch other errors (resolves #970)
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
ines 7ea1673072 Fix whitespace 2017-04-07 13:28:48 +02:00
ines 255650dbc2 Add connlu2json converter from explosion/spacy-dev-resources/#11 2017-04-07 13:05:12 +02:00
ines 789ce8a45e Add convert command 2017-04-07 13:04:17 +02:00
ines 9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
ines dcf8ab0c47 Merge branch 'develop' 2017-04-07 12:00:09 +02:00
Joshua Reeter 564daf6dec Issue #934 symlink should not convert paths as_posix under windows. 2017-03-30 23:47:45 -05:00
ines 4759fd437d Merge branch 'master' into develop 2017-03-29 10:37:13 +02:00
Grégory Howard 9c2996b27f correction of package.py (encoding on open instead of write) 2017-03-29 09:11:02 +02:00
ines 7198cf1c8a Remove unused import 2017-03-26 20:56:05 +02:00
ines 7ceaa1614b Add experimental model init command 2017-03-26 20:51:40 +02:00
Matthew Honnibal 2efdbc08ff Make training work with directories 2017-03-26 08:46:44 -05:00
Matthew Honnibal 9dcb58aaaf Merge CLI changes 2017-03-26 07:30:45 -05:00
Matthew Honnibal 6b7f7a2060 Connect parser L1 option to train CLI 2017-03-26 07:24:07 -05:00
Matthew Honnibal dec5571bf3 Update train CLI 2017-03-26 07:16:52 -05:00
ines 53cf2f1c0e Make dev data optional 2017-03-26 11:48:17 +02:00
Matthew Honnibal 5eac089fbe Merge branch 'master' into develop 2017-03-26 04:45:43 -05:00
ines 97814f8da6 Update Windows Python 2 link workaround to use helper functions 2017-03-25 14:04:27 +01:00
Greg Baker b7f714b498 Possible solution to #909 2017-03-25 21:36:38 +11:00
Matthew Honnibal 9c9cd99144 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-23 11:11:24 +01:00
ines 0035fd9efe Add spacy train work in progress 2017-03-23 11:08:41 +01:00