Commit Graph

680 Commits

Author SHA1 Message Date
Matthw Honnibal 1b20ffac38 batch_by_words by default 2020-07-08 21:37:06 +02:00
Matthw Honnibal fb8a5967c1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-08 15:27:50 +02:00
Ines Montani 0a3d41bb1d
Deprecat model shortcuts and simplify download (#5722) 2020-07-08 14:00:07 +02:00
Matthw Honnibal 42e1109def Support option to not batch by number of words 2020-07-08 11:26:54 +02:00
Ines Montani 8cb7f9ccff
Improve assets and DVC handling (#5719)
* Improve assets and DVC handling

* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani fa261d09e8 Add alternative CLI option 2020-07-06 15:57:38 +02:00
Adriane Boyd c67fc6aa5b
Make `docs_to_json` backwards-compatible with v2 (#5714)
* In `spacy convert -t json` output the JSON docs wrapped in a list

* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani 412dbb1f38
Remove dead and/or deprecated code (#5710)
* Remove dead and/or deprecated code

* Remove n_threads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem fcbf899b08
Feature/example only (#5707)
* remove _convert_examples

* fix test_gold, raise TypeError if tuples are used instead of Example's

* throwing proper errors when the wrong type of objects are passed

* fix deprectated format in tests

* fix deprectated format in parser tests

* fix tests for NEL, morph, senter, tagger, textcat

* update regression tests with new Example format

* use make_doc

* more fixes to nlp.update calls

* few more small fixes for rehearse and evaluate

* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Ines Montani 37c3bb35e2 Auto-format 2020-07-04 16:25:34 +02:00
Ines Montani 99aff16d60 Make argument shortcut consistent 2020-07-04 14:23:32 +02:00
Matthew Honnibal 2bd1bf81f1
Refactor pretrain and support character-based objective for v3 (#5706)
* Start adding character-based stuff

* Start adding character-based objective

* Start adding character-based stuff

* Start adding character-based objective

* Remove outdated comment

* Update pretraining models

* Add/fix character-based multi-task models

* Refactor pretrain and support character-based objective

* Update pretrain config

* Remove unused

* Fix flake8 errors

* Clean up imports

* Format

* Format

* Update Thinc version

* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani 84fb3a3fb3 Auto-format and fix tuple 2020-07-03 15:20:10 +02:00
Matthew Honnibal a902b5f217
Record whether Doc objects are built from known spacing (#5697)
* Tell convert CLI to store user data for Doc

* Remove assert

* Add has_unknwon_spaces flag on Doc

* Do not tokenize docs with unknown spaces in Corpus

* Handle conversion of unknown spaces in Example

* Fixes

* Fixes

* Draft has_known_spaces support in DocBin

* Add test for serialize has_unknown_spaces

* Fix DocBin serialization when has_unknown_spaces

* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd abad56db7d
Add conllu2docs converter (#5704)
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Sofie Van Landeghem 41b65fd0f8
fix to pretrain script (#5699)
* fix to pretrain script

* remove unnecessary import
2020-07-02 21:48:01 +02:00
Ines Montani ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
Fixing init_model
2020-07-02 14:10:28 +02:00
Ines Montani fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
svlandeg a30bc77415 bugfixing prune_vectors and vectors_loc 2020-07-01 21:00:47 +02:00
Matthw Honnibal 6a0a27e5c2 Fix max_steps 2020-07-01 18:08:14 +02:00
Matthw Honnibal cb51bb637b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 15:17:27 +02:00
Matthw Honnibal 2fa56484b2 Fix eval batch size 2020-07-01 15:16:25 +02:00
Matthw Honnibal c5d12d1a22 Allow batch size to be set for evaluation in spacy train 2020-07-01 15:04:36 +02:00
Ines Montani bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd 2020-07-01 14:13:19 +02:00
Matthw Honnibal 35af5819e0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 01:03:39 +02:00
Matthw Honnibal 8c5a88e777 Fix per-epoch shuffling 2020-07-01 01:02:35 +02:00
svlandeg a7d547c65e small fix 2020-06-30 21:56:17 +02:00
svlandeg 8eca7e995e add try-except to git commands to get an informative warning 2020-06-30 21:53:40 +02:00
Ines Montani b032943c34 Fix funny printing again 2020-06-30 21:33:41 +02:00
Ines Montani d64644d9d1 Adjust auto-formatting 2020-06-30 20:36:30 +02:00
Ines Montani 6da3500728 Fix command substitution 2020-06-30 20:35:51 +02:00
svlandeg e7aff9c5fc bugfix exec usage in dvc.yaml 2020-06-30 18:51:20 +02:00
svlandeg 39953c7c60 fix print_run_help with new arg order 2020-06-30 17:28:09 +02:00
svlandeg cd632d8ec2 move folder for exec argument one up 2020-06-30 17:19:36 +02:00
svlandeg 1ae6fa2554 move subcommand one place up as project_dir has default 2020-06-30 16:04:53 +02:00
svlandeg a46b76f188 use current working dir as default throughout 2020-06-30 15:39:24 +02:00
svlandeg b228111925 fix funny printing 2020-06-30 14:54:45 +02:00
Ines Montani 8e20505970 Resolve within working_dir context manager 2020-06-30 13:29:45 +02:00
Ines Montani 72175b5c60 Update project command 2020-06-30 13:17:26 +02:00
svlandeg 140c4896a0 split_command util function 2020-06-30 12:54:15 +02:00
svlandeg d23be563eb remove redundant setting of no_args_is_help 2020-06-30 11:23:35 +02:00
svlandeg b311ce982f Merge remote-tracking branch 'upstream/develop' into fix/small-edits
# Conflicts:
#	spacy/cli/project.py
2020-06-30 11:17:31 +02:00
svlandeg 7e4cbda89a fix project_init for relative path 2020-06-30 11:09:53 +02:00
Ines Montani e8033df81e Also handle python3 and pip3 2020-06-29 20:30:42 +02:00
Ines Montani c874dde66c Show help on "spacy project" 2020-06-29 20:11:34 +02:00
Ines Montani 1d2c646e57 Fix init and remove .dvc/plots 2020-06-29 20:07:21 +02:00
svlandeg 1176783310 fix one more shlex.split 2020-06-29 18:37:42 +02:00
svlandeg 894b8e7ff6 throw warning (instead of crashing) when temp dir can't be cleaned 2020-06-29 18:16:39 +02:00
svlandeg efe7eb71f2 create subfolder in working dir 2020-06-29 17:46:08 +02:00
svlandeg 3487214ba1 fix shlex.split for non-posix 2020-06-29 17:45:47 +02:00