Commit Graph

236 Commits

Author SHA1 Message Date
Matthew Honnibal f2f9229964 Fix name of update_shared flag 2017-08-20 18:19:06 +02:00
Matthew Honnibal 80a5146ec2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 11:07:08 -05:00
Matthew Honnibal 84bb543e4d Add gold_preproc flag to cli/train 2017-08-20 11:07:00 -05:00
Gyorgy Orosz e5344b83a3 Ported model cli from v1 2017-08-19 21:45:23 +02:00
Matthew Honnibal 11c31d285c Restore changes from nn-beam-parser 2017-08-18 22:26:12 +02:00
Matthew Honnibal 52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal 4ae0d5e1e6 Set defaults for convert command 2017-08-13 09:03:38 +02:00
ines d4f2baf7dd Add create_meta option to package command
Re-create meta.json in model directory, even if it exists. Especially
useful when updating existing spaCy models or training with Prodigy.
Ensures user won't end up with multiple "en_core_web_sm" models, and
offers easy way to change the model's name and settings without having
to edit the meta.json file.
2017-08-12 21:44:18 +02:00
Matthew Honnibal 8870d491f1 Remove redundant pickling during training 2017-08-12 08:55:53 -05:00
ines 28e2fec23b Fix autolinking failure on fresh model install (resolves #1138)
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Matthew Honnibal 0a566dc320 Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:18:12 +02:00
György Orosz 62dbf9025c Fixed conllu converter 2017-06-09 22:53:56 +02:00
ines 03db56f48c Detect spaCy version and add package title
Package title allows customised package names (like spacy-nightly)
2017-06-05 20:11:02 +02:00
Matthew Honnibal c52fde40f4 Improve train CLI 2017-06-04 20:18:37 -05:00
ines 848e47669e Fix typo 2017-06-04 20:44:15 +02:00
ines 7b7d46b64e Fix typo and success message 2017-06-04 13:45:50 +02:00
Matthew Honnibal 21eef90dbc Support specifying which GPU 2017-06-03 16:10:23 -05:00
Matthew Honnibal 43353b5413 Improve train CLI script 2017-06-03 13:28:20 -05:00
ines e5ae6ccf4e Fix typo 2017-06-01 16:46:15 +02:00
Matthew Honnibal 8a693c2605 Write binary file during training 2017-05-31 02:59:18 +02:00
ines 9e83a17e95 Use new model templates 2017-05-29 15:27:24 +02:00
Matthew Honnibal 8a24c60c1e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 08:12:05 -05:00
Matthew Honnibal 5cf47b847b Handle iob with no tag in converter 2017-05-28 08:11:39 -05:00
ines c1983621fb Update util functions for model loading 2017-05-28 00:22:40 +02:00
Matthew Honnibal 49235017bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 16:34:28 -05:00
Matthew Honnibal 5e4312feed Evaluate loaded class, to ensure save/load works 2017-05-27 15:47:02 -05:00
Matthew Honnibal 7cc9c3e9a6 Fix convert CLI 2017-05-27 15:44:42 -05:00
ines 1203959625 Add pipeline setting to meta.json generator 2017-05-27 20:02:01 +02:00
ines 086a06e7d7 Fix CLI docstrings and add command as first argument
Workaround for Plac
2017-05-27 20:01:46 +02:00
Matthew Honnibal dc07d72d80 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 08:20:40 -05:00
Matthew Honnibal de13fe0305 Remove length cap on sentences 2017-05-27 08:20:32 -05:00
Matthew Honnibal d06f235fc9 Fix conflict on convert.py 2017-05-26 11:33:29 -05:00
Matthew Honnibal 2b3b937a04 Fix converter CLI 2017-05-26 11:32:41 -05:00
Matthew Honnibal 5a87bcf35f Fix converters 2017-05-26 11:32:34 -05:00
Matthew Honnibal d65f99a720 Improve model saving in train script 2017-05-26 05:52:09 -05:00
Matthew Honnibal 22d7b448a5 Fix convert command 2017-05-25 19:47:12 -05:00
Matthew Honnibal df8015f05d Tweaks to train script 2017-05-25 17:15:24 -05:00
Matthew Honnibal 702fe74a4d Clean up spacy.cli.train 2017-05-25 16:16:30 -05:00
Matthew Honnibal 135a13790c Disable gold preprocessing 2017-05-24 20:10:20 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 6e8dce2c05 Fix train command line args 2017-05-22 10:41:39 -05:00
Matthew Honnibal ae8cf70dc1 Fix CLI train signature 2017-05-22 06:13:39 -05:00
ines fc3ec733ea Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
Matthew Honnibal bc2294d7f1 Add support for fiddly hyper-parameters to train func 2017-05-22 04:51:08 -05:00
Matthew Honnibal 4e0988605a Pass through non-projective=True 2017-05-22 04:51:08 -05:00
Matthew Honnibal e14533757b Use averaged params for evaluation 2017-05-22 04:51:08 -05:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
Matthew Honnibal baf3ef0ddc Remove import of removed train_config script 2017-05-21 09:07:34 -05:00
Matthew Honnibal 4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
ines 0c6c65aa3c Improve messaging if model linking fails after download 2017-05-21 00:28:37 +02:00
ines e39ad78267 Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
2017-05-20 12:24:40 +02:00
Matthew Honnibal 3376d4d6e8 Update the train script, fixing GPU memory leak 2017-05-19 18:15:50 -05:00
Matthew Honnibal 08766240c3 Add incomplete iob converter 2017-05-19 13:27:51 -05:00
Matthew Honnibal 09a877886b WIP on iob converter 2017-05-19 13:24:39 -05:00
Matthew Honnibal ca70b08661 Fix GPU training and evaluation 2017-05-18 08:30:33 -05:00
Matthew Honnibal fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal 55dab77de8 Add conversion rule for .conll 2017-05-17 13:13:48 +02:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 3bf4a28d8d Use tag in CoNLL converter, not POS 2017-05-17 12:04:33 +02:00
Matthew Honnibal 8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal 5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines 9d85cda8e4 Fix models error message and use about.__docs_models__ (see #1051) 2017-05-13 13:05:47 +02:00
ines 4eefb288e3 Port over PR #1055 2017-05-13 03:25:32 +02:00
ines 95edd9e896 Let parse_package_meta take full path 2017-05-08 15:30:48 +02:00
ines 59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
ines 527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Matthew Honnibal 4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
ines 3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
ines 25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Gyorgy Orosz 4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
ines 48da244058 Use spacy.compat.json_dumps for Python 2/3 compatibility (resolves #991) 2017-04-19 11:50:36 +02:00
ines 82f5f1f98f Replace str with compat.unicode_ 2017-04-17 01:29:54 +02:00
Matthew Honnibal 17c9fffb9e Fix naked except 2017-04-16 15:28:16 -05:00
ines 6145b7c153 Remove redundant Path 2017-04-16 20:53:25 +02:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
ines 8191e33cf1 Update link error message with info on permissions 2017-04-16 13:32:31 +02:00
ines a3ddbc0444 Add note about --force flag to error message 2017-04-16 13:14:36 +02:00
ines e3de035814 Add meta validation to check for required settings
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines a7574b7572 Add more options to read in meta data in package command
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines 13c8a42d2b Fix typos 2017-04-16 13:03:58 +02:00
ines 35fb4febe2 Fix whitespace 2017-04-15 12:13:45 +02:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
ines 84341c2975 Only compile list of models if data_path exists 2017-04-14 16:48:02 +02:00
Gyorgy Orosz dd3244c08a Made json dump to produce unicode strings in py2 2017-04-13 23:30:47 +02:00
Gyorgy Orosz a9469c8173 Fixed typo 2017-04-13 15:24:14 +02:00
ines 41037f0f07 Remove unused imports 2017-04-13 13:52:11 +02:00
ines 1b92c8d5d5 Use unicode paths on Windows/Python 2 and catch other errors (resolves #970)
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
ines 7ea1673072 Fix whitespace 2017-04-07 13:28:48 +02:00
ines 255650dbc2 Add connlu2json converter from explosion/spacy-dev-resources/#11 2017-04-07 13:05:12 +02:00
ines 789ce8a45e Add convert command 2017-04-07 13:04:17 +02:00
ines 9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
ines dcf8ab0c47 Merge branch 'develop' 2017-04-07 12:00:09 +02:00
Joshua Reeter 564daf6dec Issue #934 symlink should not convert paths as_posix under windows. 2017-03-30 23:47:45 -05:00
ines 4759fd437d Merge branch 'master' into develop 2017-03-29 10:37:13 +02:00
Grégory Howard 9c2996b27f correction of package.py (encoding on open instead of write) 2017-03-29 09:11:02 +02:00
ines 7198cf1c8a Remove unused import 2017-03-26 20:56:05 +02:00
ines 7ceaa1614b Add experimental model init command 2017-03-26 20:51:40 +02:00
Matthew Honnibal 2efdbc08ff Make training work with directories 2017-03-26 08:46:44 -05:00
Matthew Honnibal 9dcb58aaaf Merge CLI changes 2017-03-26 07:30:45 -05:00
Matthew Honnibal 6b7f7a2060 Connect parser L1 option to train CLI 2017-03-26 07:24:07 -05:00
Matthew Honnibal dec5571bf3 Update train CLI 2017-03-26 07:16:52 -05:00
ines 53cf2f1c0e Make dev data optional 2017-03-26 11:48:17 +02:00
Matthew Honnibal 5eac089fbe Merge branch 'master' into develop 2017-03-26 04:45:43 -05:00
ines 97814f8da6 Update Windows Python 2 link workaround to use helper functions 2017-03-25 14:04:27 +01:00
Greg Baker b7f714b498 Possible solution to #909 2017-03-25 21:36:38 +11:00
Matthew Honnibal 9c9cd99144 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-23 11:11:24 +01:00
ines 0035fd9efe Add spacy train work in progress 2017-03-23 11:08:41 +01:00
ines c3a9f73896 Fix writing to file 2017-03-21 12:35:22 +01:00
ines d74aa428ad Fix path 2017-03-21 12:26:00 +01:00
ines 83a999ea83 Change default license from MIT to CC 2017-03-21 12:24:43 +01:00
ines ae46647560 Fix brackets 2017-03-21 12:21:42 +01:00
ines 3e134b5b2b Make sure paths in copytree and rmtree are strings 2017-03-21 12:15:33 +01:00
ines cf0094187e Fetch MANIFEST.in from GitHub as well 2017-03-21 11:32:38 +01:00
ines 3f4e3fda1d Update command and fetch file templates from GitHub
While feature is still experimental, this allows files to be modified
without having to ship a new version of spaCy.
2017-03-21 11:17:36 +01:00
ines 5230ed5b98 Move directory check and overwriting/creating dirs to own function 2017-03-21 02:06:53 +01:00
ines 46bc3c36b0 Fix typo 2017-03-21 02:06:37 +01:00
ines 64e38f304e Only import shutil 2017-03-21 02:06:29 +01:00
ines 448a916d0d Add --force option to override directory 2017-03-21 02:05:34 +01:00
ines bf240132d7 Add cli.package command to build model packages 2017-03-20 22:50:13 +01:00
Matthew Honnibal 692eb0603d Fix high memory usage in download command
Due to PyPi issue #2984, installing large packages via pip causes
a large spike in memory usage. The recommended fix is to disable
caching.
2017-03-20 18:24:44 +01:00
ines b8f8d5d8bf Make sure model_path is a Posix path
Otherwise, formatting the success message with model_path.as_posix()
fails when using a local path for linking (linking still works, but the
error message is confusing)
2017-03-19 11:57:13 +01:00
ines 8de5108af6 Exclude common cache directories from mode list in cli.info
This means models called "cache" etc. won't show up in the list, but it
seems worth it.
2017-03-19 01:44:43 +01:00
Matthew Honnibal 797f286c38 Use import to find data package 2017-03-19 01:39:36 +01:00
Matthew Honnibal bc10d06bc2 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-18 19:32:54 +01:00
Matthew Honnibal 1754e0db9b Call pip via subprocess, to make it use virtualenv 2017-03-18 19:29:36 +01:00
ines 1277abcde2 Remove print statement 2017-03-18 19:14:58 +01:00
Matthew Honnibal dcec104643 Remove unused import 2017-03-18 18:57:45 +01:00
Matthew Honnibal 703eb7bdbd Fix link module 2017-03-18 18:57:31 +01:00
ines 7d33104180 Use distutils.sysconfig.get_python_lib
site.getsitepackages seems to not work as expected in Python 2
2017-03-18 18:20:40 +01:00
ines 0dd7710556 Make sure paths are paths 2017-03-18 16:48:52 +01:00
ines ec3e810662 Add directory cli and set up command line interface 2017-03-18 15:14:48 +01:00