Commit Graph

675 Commits

Author SHA1 Message Date
Ines Montani fa261d09e8 Add alternative CLI option 2020-07-06 15:57:38 +02:00
Adriane Boyd c67fc6aa5b
Make `docs_to_json` backwards-compatible with v2 (#5714)
* In `spacy convert -t json` output the JSON docs wrapped in a list

* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani 412dbb1f38
Remove dead and/or deprecated code (#5710)
* Remove dead and/or deprecated code

* Remove n_threads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem fcbf899b08
Feature/example only (#5707)
* remove _convert_examples

* fix test_gold, raise TypeError if tuples are used instead of Example's

* throwing proper errors when the wrong type of objects are passed

* fix deprectated format in tests

* fix deprectated format in parser tests

* fix tests for NEL, morph, senter, tagger, textcat

* update regression tests with new Example format

* use make_doc

* more fixes to nlp.update calls

* few more small fixes for rehearse and evaluate

* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Ines Montani 37c3bb35e2 Auto-format 2020-07-04 16:25:34 +02:00
Ines Montani 99aff16d60 Make argument shortcut consistent 2020-07-04 14:23:32 +02:00
Matthew Honnibal 2bd1bf81f1
Refactor pretrain and support character-based objective for v3 (#5706)
* Start adding character-based stuff

* Start adding character-based objective

* Start adding character-based stuff

* Start adding character-based objective

* Remove outdated comment

* Update pretraining models

* Add/fix character-based multi-task models

* Refactor pretrain and support character-based objective

* Update pretrain config

* Remove unused

* Fix flake8 errors

* Clean up imports

* Format

* Format

* Update Thinc version

* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani 84fb3a3fb3 Auto-format and fix tuple 2020-07-03 15:20:10 +02:00
Matthew Honnibal a902b5f217
Record whether Doc objects are built from known spacing (#5697)
* Tell convert CLI to store user data for Doc

* Remove assert

* Add has_unknwon_spaces flag on Doc

* Do not tokenize docs with unknown spaces in Corpus

* Handle conversion of unknown spaces in Example

* Fixes

* Fixes

* Draft has_known_spaces support in DocBin

* Add test for serialize has_unknown_spaces

* Fix DocBin serialization when has_unknown_spaces

* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd abad56db7d
Add conllu2docs converter (#5704)
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Sofie Van Landeghem 41b65fd0f8
fix to pretrain script (#5699)
* fix to pretrain script

* remove unnecessary import
2020-07-02 21:48:01 +02:00
Ines Montani ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
Fixing init_model
2020-07-02 14:10:28 +02:00
Ines Montani fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
svlandeg a30bc77415 bugfixing prune_vectors and vectors_loc 2020-07-01 21:00:47 +02:00
Matthw Honnibal 6a0a27e5c2 Fix max_steps 2020-07-01 18:08:14 +02:00
Matthw Honnibal cb51bb637b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 15:17:27 +02:00
Matthw Honnibal 2fa56484b2 Fix eval batch size 2020-07-01 15:16:25 +02:00
Matthw Honnibal c5d12d1a22 Allow batch size to be set for evaluation in spacy train 2020-07-01 15:04:36 +02:00
Ines Montani bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd 2020-07-01 14:13:19 +02:00
Matthw Honnibal 35af5819e0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 01:03:39 +02:00
Matthw Honnibal 8c5a88e777 Fix per-epoch shuffling 2020-07-01 01:02:35 +02:00
svlandeg a7d547c65e small fix 2020-06-30 21:56:17 +02:00
svlandeg 8eca7e995e add try-except to git commands to get an informative warning 2020-06-30 21:53:40 +02:00
Ines Montani b032943c34 Fix funny printing again 2020-06-30 21:33:41 +02:00
Ines Montani d64644d9d1 Adjust auto-formatting 2020-06-30 20:36:30 +02:00
Ines Montani 6da3500728 Fix command substitution 2020-06-30 20:35:51 +02:00
svlandeg e7aff9c5fc bugfix exec usage in dvc.yaml 2020-06-30 18:51:20 +02:00
svlandeg 39953c7c60 fix print_run_help with new arg order 2020-06-30 17:28:09 +02:00
svlandeg cd632d8ec2 move folder for exec argument one up 2020-06-30 17:19:36 +02:00
svlandeg 1ae6fa2554 move subcommand one place up as project_dir has default 2020-06-30 16:04:53 +02:00
svlandeg a46b76f188 use current working dir as default throughout 2020-06-30 15:39:24 +02:00
svlandeg b228111925 fix funny printing 2020-06-30 14:54:45 +02:00
Ines Montani 8e20505970 Resolve within working_dir context manager 2020-06-30 13:29:45 +02:00
Ines Montani 72175b5c60 Update project command 2020-06-30 13:17:26 +02:00
svlandeg 140c4896a0 split_command util function 2020-06-30 12:54:15 +02:00
svlandeg d23be563eb remove redundant setting of no_args_is_help 2020-06-30 11:23:35 +02:00
svlandeg b311ce982f Merge remote-tracking branch 'upstream/develop' into fix/small-edits
# Conflicts:
#	spacy/cli/project.py
2020-06-30 11:17:31 +02:00
svlandeg 7e4cbda89a fix project_init for relative path 2020-06-30 11:09:53 +02:00
Ines Montani e8033df81e Also handle python3 and pip3 2020-06-29 20:30:42 +02:00
Ines Montani c874dde66c Show help on "spacy project" 2020-06-29 20:11:34 +02:00
Ines Montani 1d2c646e57 Fix init and remove .dvc/plots 2020-06-29 20:07:21 +02:00
svlandeg 1176783310 fix one more shlex.split 2020-06-29 18:37:42 +02:00
svlandeg 894b8e7ff6 throw warning (instead of crashing) when temp dir can't be cleaned 2020-06-29 18:16:39 +02:00
svlandeg efe7eb71f2 create subfolder in working dir 2020-06-29 17:46:08 +02:00
svlandeg 3487214ba1 fix shlex.split for non-posix 2020-06-29 17:45:47 +02:00
Ines Montani 126050f259 Improve asset fetching
Get all paths first and run dvc add once so it only shows one progress bar and one combined git command (if repo is git repo)
2020-06-29 16:55:24 +02:00
Ines Montani 7c08713baa Improve error messages 2020-06-29 16:54:47 +02:00
Ines Montani 24664efa23 Import project_run_all function 2020-06-29 16:54:19 +02:00
svlandeg f8dddeda27 print help msg when just calling 'project' without args 2020-06-29 16:38:15 +02:00
svlandeg bf43ebbf61 fix typo's 2020-06-29 16:32:25 +02:00