Commit Graph

1081 Commits

Author SHA1 Message Date
Ines Montani 24e7ac3f2b Fix download CLI [ci skip] 2020-09-24 14:43:56 +02:00
Ines Montani 88e54caa12 accuracy -> performance 2020-09-24 14:32:35 +02:00
Ines Montani be56c0994b Add [training.before_to_disk] callback 2020-09-24 12:40:25 +02:00
Ines Montani c6c67b606e
Merge pull request #6133 from explosion/fix/score_weights 2020-09-24 12:00:57 +02:00
Ines Montani f69fea8b25 Improve error handling around non-number scores 2020-09-24 11:29:07 +02:00
Matthew Honnibal 17a6b0a173
Make project pull order insensitive (#6131) 2020-09-24 10:30:42 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
svlandeg 35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
svlandeg 6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani 7745d77a38 Fix whitespace in template [ci skip] 2020-09-23 13:21:42 +02:00
svlandeg 6435458d51 simplify expression 2020-09-23 12:12:38 +02:00
svlandeg 20b0ec5dcf avoid logging performance of frozen components 2020-09-23 10:37:12 +02:00
Ines Montani 6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
Ines Montani 888f936a73
Merge pull request #6106 from svlandeg/feature/textcat-quickstart 2020-09-23 10:11:45 +02:00
Ines Montani 60a317520a
Merge pull request #6109 from svlandeg/feature/2rename 2020-09-23 09:47:12 +02:00
svlandeg 556f3e4652 add pooling to NEL's TransformerListener 2020-09-23 09:24:28 +02:00
Sofie Van Landeghem 86a08f819d
tok2vec.update instead of predict (#6113) 2020-09-22 21:54:52 +02:00
Ines Montani 5e3b796b12 Validate section refs in debug config 2020-09-22 12:24:39 +02:00
svlandeg 085a1c8e2b add no_output_layer to TextCatBOW config 2020-09-22 12:06:40 +02:00
svlandeg b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00
svlandeg e931f4d757 add textcat score 2020-09-22 10:56:43 +02:00
svlandeg 396b33257f add entity_linker to jinja template 2020-09-22 10:40:05 +02:00
svlandeg 135de82a2d add textcat to quickstart 2020-09-22 10:22:06 +02:00
Ines Montani 6316d5f398 Improve messages in project CLI [ci skip] 2020-09-22 09:45:34 +02:00
Ines Montani 81606b29bd
Merge pull request #6104 from svlandeg/fix/debug_model [ci skip] 2020-09-22 09:31:23 +02:00
svlandeg 45b29c4a5b cleanup 2020-09-21 23:17:23 +02:00
svlandeg fa5c416db6 initialize through nlp object and with train_corpus 2020-09-21 23:09:22 +02:00
svlandeg 447b3e5787 Merge remote-tracking branch 'upstream/develop' into fix/debug_model
# Conflicts:
#	spacy/cli/debug_model.py
2020-09-21 16:58:40 +02:00
Ines Montani e8bcaa44f1 Don't auto-decompress archives with smart_open [ci skip] 2020-09-21 16:01:46 +02:00
svlandeg eb9b447960 Merge remote-tracking branch 'upstream/develop' into fix/debug_model
# Conflicts:
#	spacy/cli/debug_model.py
2020-09-21 14:05:16 +02:00
Ines Montani 758ead8a47 Sync overrides with CLI overrides 2020-09-21 12:50:13 +02:00
Ines Montani 5497acf49a Support config overrides via environment variables 2020-09-21 11:25:10 +02:00
Ines Montani 1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Ines Montani b2302c0a1c Improve error for missing dependency 2020-09-20 17:44:51 +02:00
Matthew Honnibal 8fb59d958c Format 2020-09-20 16:31:48 +02:00
Matthew Honnibal dc22771f87 Fix sparse checkout 2020-09-20 16:30:05 +02:00
Matthew Honnibal a0fb5e50db Use simple git clone call if not sparse 2020-09-20 16:22:04 +02:00
Matthew Honnibal 2c24d633d0 Use updated run_command 2020-09-20 16:21:43 +02:00
Ines Montani 554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
svlandeg 6db1d5dc0d trying some stuff 2020-09-19 19:11:30 +02:00
Ines Montani e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2 2020-09-19 12:33:38 +02:00
Sofie Van Landeghem 39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg 73ff52b9ec hack for tok2vec listener 2020-09-18 16:43:15 +02:00
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
svlandeg e4fc7e0222 fixing output sample to proper 2D array 2020-09-17 22:34:36 +02:00
Ines Montani 3865214343 Use consistent shortcut 2020-09-17 16:57:02 +02:00
svlandeg 35a3931064 fix typo 2020-09-17 16:36:27 +02:00
svlandeg ddfc1fc146 add pretraining option to init config 2020-09-17 16:05:40 +02:00
svlandeg 427dbecdd6 cleanup and formatting 2020-09-17 11:48:04 +02:00
svlandeg 0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg 51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Ines Montani 9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem 3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani c052017025 Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
Matthew Honnibal 54c40223a1
Improve v3 pretrain command (#6040)
* Starts to run

* Update pretrain script

* Update corpus

* Update pretrain schema

* Remove outdated test

* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani febb99916d Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
Ines Montani a5633b205f Fix handling of errors around git [ci skip] 2020-09-13 10:52:28 +02:00
Ines Montani f8846c198d Update types and docstrings 2020-09-13 10:52:02 +02:00
Matthew Honnibal 37347830d4 Fix reading in GloVe vectors 2020-09-12 17:31:18 +02:00
Ines Montani b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config 2020-09-12 17:12:35 +02:00
Ines Montani eedaaaec75 Fix handling of existing asset without checksum [ci skip] 2020-09-12 17:02:53 +02:00
svlandeg a75cfe0da6 Merge remote-tracking branch 'upstream/develop' into feature/cli-config 2020-09-12 14:44:40 +02:00
svlandeg 115147804a string_to_list to parse comma-separated string into a list 2020-09-12 14:43:22 +02:00
Ines Montani f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat 2020-09-12 10:30:49 +02:00
Ines Montani 0b2e07215d Support overwriting name on spacy package 2020-09-11 11:38:28 +02:00
svlandeg 5b94aeece9 support pipeline as "list in string" 2020-09-11 11:08:46 +02:00
Ines Montani 1bce432b4a Adjust message [ci skip] 2020-09-11 10:00:49 +02:00
Ines Montani 5acd4fbcd8 Merge branch 'develop' into fix/clone-compat 2020-09-11 09:58:30 +02:00
Ines Montani 761bd60d43 Adjust info message 2020-09-11 09:57:00 +02:00
Ines Montani 6831161bfa Resolve path to be extra sure 2020-09-11 09:56:49 +02:00
svlandeg 1723fb73c4 remove brol 2020-09-10 17:44:59 +02:00
svlandeg 08a831ce83 process trailing slash if any 2020-09-10 17:39:52 +02:00
Ines Montani 3e83a509bb WIP: fix project clone compatibility 2020-09-10 15:49:13 +02:00
svlandeg f1bc09c1e9 restore partly 2020-09-10 14:53:02 +02:00
svlandeg 3889747119 asset fix & UX 2020-09-10 14:36:53 +02:00
svlandeg a36766d153 hookup branch 2020-09-10 12:00:34 +02:00
svlandeg 97d99f7efa Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes 2020-09-10 11:51:34 +02:00
Ines Montani 908f3a4494 Update default projects repo [ci skip] 2020-09-10 11:42:14 +02:00
svlandeg 92f9d2f406 small UX fixes 2020-09-10 11:35:50 +02:00
svlandeg 1fc5486792 more fine-grained errors for git_sparse_checkout 2020-09-10 11:31:32 +02:00
Ines Montani 15bc3a37b4 Add --branch to project clone 2020-09-10 11:08:15 +02:00
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Sofie Van Landeghem 60f22e1800
Pipe API (#6034)
* ensure Language passes on valid examples for initialization

* fix tagger model initialization

* check for valid get_examples across components

* assume labels were added before begin_training

* fix senter initialization

* fix morphologizer initialization

* use methods to check arguments

* test textcat init, requires thinc>=8.0.0a31

* fix tok2vec init

* fix entity linker init

* use islice

* fix simple NER

* cleanup debug model

* fix assert statements

* fix tests

* throw error when adding a label if the output layer can't be resized anymore

* fix test

* add failing test for simple_ner

* UX improvements

* morphologizer UX

* assume begin_training gets a representative set and processes the labels

* remove assumptions for output of untrained NER model

* restore test for original purpose
2020-09-08 22:44:25 +02:00
Matthew Honnibal ba5f4c9b32 Add words and seconds to train info 2020-09-08 15:24:47 +02:00
Matthew Honnibal b470062153
Add CLI registry (#6037) 2020-09-08 15:23:34 +02:00
Matthew Honnibal 4b7abaafdb Fix learn rate for non-transformer 2020-09-04 21:22:50 +02:00
Matthew Honnibal 465785a672 Fix project pull and push 2020-09-04 21:15:55 +02:00
Ines Montani ab1bb421ed Update docs links in codebase 2020-09-04 12:58:50 +02:00
Ines Montani 2189046869
Merge pull request #6024 from explosion/chore/registry-renaming 2020-09-04 10:54:10 +02:00
Matthew Honnibal 1c07820681 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-03 18:54:21 +02:00
Matthew Honnibal 7be8a0516a Fix project pull 2020-09-03 18:54:03 +02:00
Ines Montani 23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani c063e55eb7 Add prefix to batchers 2020-09-03 17:30:41 +02:00
Ines Montani c53b1433b9 Adjust more arguments [ci skip] 2020-09-03 17:12:24 +02:00
Ines Montani b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Matthew Honnibal 122cb02001 Fix averages 2020-09-02 19:37:43 +02:00
Marek Grzenkowicz 92d7832a86
Fix off-by-one error for best iteration calculation (closes #6014) (#6016) 2020-09-02 15:15:45 +02:00
Sofie Van Landeghem 6bfb1b3a29
Fix sparse checkout for 'spacy project' (#6008)
* exit if cloning fails

* UX

* rewrite http link to git protocol, don't use stdin

* fixes to sparse checkout

* formatting
2020-09-01 19:49:01 +02:00
Ines Montani 70b226f69d Support ignore marker in project document [ci skip] 2020-09-01 12:49:04 +02:00
Ines Montani a4c51f0f18 Add v3 info to project docs [ci skip] 2020-09-01 12:36:21 +02:00
Ines Montani ef9005273b Update fill-config command and add silent mode [ci skip] 2020-09-01 12:07:04 +02:00
Matthew Honnibal ec660e3131 Fix use_pytorch_for_gpu_memory 2020-09-01 00:41:38 +02:00
Matthw Honnibal c38298b8fa Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-31 19:55:55 +02:00
Matthw Honnibal fe298fa50a Shuffle on first epoch of train 2020-08-31 19:55:22 +02:00
svlandeg 13ee742fb4 example of custom logger 2020-08-31 14:24:41 +02:00
svlandeg c18eb63483 Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
# Conflicts:
#	website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Sofie Van Landeghem ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Ines Montani 45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components 2020-08-29 15:58:41 +02:00
Ines Montani 34146750d4 Use frozen list with custom errors
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani 2bc31e15c9 Tidy up and auto-format [ci skip] 2020-08-29 13:01:10 +02:00
svlandeg 5230529de2 add loggers registry & logger docs sections 2020-08-28 21:44:04 +02:00
Ines Montani 4ca2698f85 Merge branch 'develop' into feature/debug-config 2020-08-28 11:19:17 +02:00
Ines Montani d1780db6a4 Tidy up and use different error [ci skip] 2020-08-27 18:56:55 +02:00
Ines Montani ff4175e839 Add more info to debug config 2020-08-27 18:17:58 +02:00
Ines Montani 8692d176f6
Merge pull request #5978 from explosion/feature/update-wasabi
Update wasabi: new diff_strings and MarkdownRenderer
2020-08-26 19:02:52 +02:00
Matthew Honnibal 9b22714a4e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-26 15:48:45 +02:00
Matthew Honnibal 172af24f95 Fix upload and download 2020-08-26 15:48:23 +02:00
Ines Montani a5fff1df51 Remove outdated non-empty output dir warning [ci skip] 2020-08-26 15:45:51 +02:00
Ines Montani 3aec98ca38 Update wasabi: new diff_strings and MarkdownRenderer 2020-08-26 15:33:11 +02:00
Sofie Van Landeghem 79d460e3a2
Weights & Biases logger for train CLI (#5971)
* quick test as part of train script

* train_logger in config, default ConsoleLogger in loggers catalogue

* entitiy typo

* add wandb_logger

* cleanup

* Update spacy/cli/train_logger.py

Co-authored-by: Ines Montani <ines@ines.io>

* move loggers to gold.loggers

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-26 15:24:33 +02:00
Ines Montani 0997c30b9e
Merge pull request #5974 from explosion/feature/project-document 2020-08-26 15:14:13 +02:00
Ines Montani 627617a079 Tidy up and add docs [ci skip] 2020-08-26 13:24:55 +02:00
Ines Montani aeebc6678d Small cleanup and adjustments 2020-08-26 10:26:57 +02:00
Ines Montani 31567d1e42 Link project.yml 2020-08-26 10:26:32 +02:00
Ines Montani 6c2a5ff53b Auto-link local sources 2020-08-26 10:26:06 +02:00
Matthew Honnibal 2771e4f2b3
Fix the git "sparse checkout" functionality (#5973)
* Fix the git sparse checkout functionality

* Format
2020-08-26 04:00:14 +02:00
Ines Montani 1c958a76c1 Add comment markers to only replace auto-generated docs 2020-08-26 00:03:06 +02:00
Ines Montani f10989e8c4 Add "project document" and more project.yml meta fields 2020-08-25 17:14:27 +02:00
Ines Montani fdcaf86c54 Adjust docstring
End sentence earlier so it's shown as a full sentence in --help
2020-08-25 17:13:50 +02:00
Ines Montani b89f6fa011 Fix meta defaults and error in package command 2020-08-25 17:13:33 +02:00
Ines Montani dd84577a98 Update CLI utils, project.yml schema and add test 2020-08-25 11:54:53 +02:00
Matthew Honnibal 8038b87f04
Various small tweaks to project CLI (#5965)
* Fix up/download of http and local paths

* Support git_sparse_checkout for assets

* Fix scorer

* Handle already-present directories for git assets

* Improve convert command

* Fix support for existant files in git assets

* Support branches in git sparse checkout

* Format

* Fix git assets

* Document git block in assets

* Fix test

* Fix test

* Revert "Fix test"

This reverts commit cf3097260f.

* Revert "Fix test"

This reverts commit 964d636e27.

* Dont multiply p/r/f by 100

* Display scores * 100 during training
2020-08-25 00:30:52 +02:00
Ines Montani e12b03358b
Support removing extra values in fill-config (#5966)
* Support removing extra values in fill-config

* Fix test
2020-08-24 22:53:47 +02:00
Ines Montani 0e7f99da58
Fix handling of optional [pretraining] block (#5954)
* Fix handling of optional [pretraining] block

* Remote pretraining from default config

* Fix test

* Add schema option for empty pretrain block
2020-08-24 15:56:03 +02:00
Matthew Honnibal 64df37643f Update lockfile after project pull 2020-08-24 03:27:09 +02:00
Matthew Honnibal 588c28fe45 Fix project pull when deps missing 2020-08-24 01:23:36 +02:00
Matthew Honnibal 160a855246 Format 2020-08-23 21:15:12 +02:00
Matthew Honnibal 89f5b8abb3 Fix project push 2020-08-23 21:14:44 +02:00
Matthew Honnibal 3828bc3ed0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-23 18:32:24 +02:00
Matthew Honnibal e559867605
Allow spacy project to push and pull to/from remote storage (#5949)
* Add utils for working with remote storage

* WIP add remote_cache for project

* WIP add push and pull commands

* Use pathy in remote_cache

* Updarte util

* Update remote_cache

* Update util

* Update project assets

* Update pull script

* Update push script

* Fix type annotation in util

* Work on remote storage

* Remove site and env hash

* Fix imports

* Fix type annotation

* Require pathy

* Require pathy

* Fix import

* Add a util to handle project variable substitution

* Import push and pull commands

* Fix pull command

* Fix push command

* Fix tarfile in remote_storage

* Improve printing

* Fiddle with status messages

* Set version to v3.0.0a9

* Draft docs for spacy project remote storages

* Update docs [ci skip]

* Use Thinc config to simplify and unify template variables

* Auto-format

* Don't import Pathy globally for now

Causes slow and annoying Google Cloud warning

* Tidy up test

* Tidy up and update tests

* Update to latest Thinc

* Update docs

* variables -> vars

* Update docs [ci skip]

* Update docs [ci skip]

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Matthew Honnibal fe1cf7e124 Allow score_weights to list extra scores 2020-08-23 18:31:30 +02:00
Ines Montani 9bdc9e81f5 Fix error message [ci skip] 2020-08-23 12:14:02 +02:00
Ines Montani 3826cfb8fe
Merge pull request #5930 from svlandeg/feature/init-config-fix
UX for init config
2020-08-21 12:06:33 +02:00
Ines Montani 79af7dcd6d Small wording adjustments [ci skip] 2020-08-21 12:06:19 +02:00
Matthew Honnibal c356e62908 Minor adjustments to quickstart template 2020-08-21 00:10:21 +02:00
Ines Montani 6ad59d59fe Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip] 2020-08-20 11:20:58 +02:00
svlandeg b96cd9fa5e fix typo 2020-08-19 18:46:08 +02:00
Ines Montani e2f2ef3a5a Update init config and recommendations
- As much as I dislike YAML, it seemed like a better format here because it allows us to add comments if we want to explain the different recommendations
- Don't include the generated JS in the repo by default and build it on the fly when running or deploying the site. This ensures it's always up to date.
- Simplify jinja_to_js script and use fewer dependencies
2020-08-19 13:33:15 +02:00
svlandeg a8acedd4ba example of custom reader and batcher 2020-08-18 19:15:16 +02:00
Sofie Van Landeghem 688e77562b
Train CLI script fixes (#5931)
* fix dash replacement in overrides arguments

* perform interpolation on training config

* make sure only .spacy files are read
2020-08-18 16:06:37 +02:00
Ines Montani 82f0e20318 Update docs and consistency [ci skip] 2020-08-18 14:39:40 +02:00
svlandeg 10e67b400c output_file required, spacy-transformers prefered instead of required 2020-08-18 13:38:43 +02:00
Ines Montani 990c6b4c32 Update docs and CLI [ci skip] 2020-08-17 21:38:20 +02:00
Ines Montani 3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Ines Montani 6ae83bde0c Fix CLI consistency [ci skip] 2020-08-16 15:46:29 +02:00
Ines Montani 45f13cbf64
Merge pull request #5916 from explosion/feature/new-thinc-config 2020-08-16 15:24:12 +02:00
Ines Montani 34bda91695 Show warnings if there's nothing to auto-fill 2020-08-16 14:19:43 +02:00
Ines Montani dd5804d499 Update type hints 2020-08-16 14:19:33 +02:00
Ines Montani a570c304df Update quickstart, template and docs 2020-08-15 14:50:29 +02:00
Ines Montani fdcde9b0bf Add init fill-config 2020-08-14 16:49:26 +02:00
Ines Montani 8128e5eb35 Replace lexeme_norm warning with logging 2020-08-14 15:00:52 +02:00
Ines Montani 37814b608d Remove env_opt and simplfy default Optimizer 2020-08-14 14:59:54 +02:00
Ines Montani ab1d165bba Pass optimizer defined in config to resume/begin_training
Otherwise, this would create a default optimizer, which isn't what we want?
2020-08-14 14:59:22 +02:00
Ines Montani 67cc39af7f Update Thinc and include section order 2020-08-14 14:06:22 +02:00
Ines Montani 88b0a96801 Update for new Thinc and adjust config 2020-08-13 17:38:30 +02:00
Ines Montani 950832f087
Tidy up pipes (#5906)
* Tidy up pipes

* Fix init, defaults and raise custom errors

* Update docs

* Update docs [ci skip]

* Apply suggestions from code review

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* Tidy up error handling and validation, fix consistency

* Simplify get_examples check

* Remove unused import [ci skip]

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-11 23:29:31 +02:00
Ines Montani d5c78c7a34 Update docs and fix consistency 2020-08-09 22:31:52 +02:00
Ines Montani 1d01d89b79 Update CLI docs and evaluate command [ci skip] 2020-08-07 14:40:58 +02:00
Ines Montani 913d21f0a3
Merge pull request #5882 from explosion/feature/raise-from
Use "raise ... from" in custom errors for better tracebacks
2020-08-06 00:35:26 +02:00
Ines Montani 06e80d95cd
Sync develop with nightly docs state (#5883)
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2020-08-06 00:28:14 +02:00
Ines Montani d92954ac1d
Merge pull request #5881 from explosion/feature/better-error-model-shortcuts 2020-08-06 00:13:35 +02:00
Ines Montani 56c17973aa Use "raise ... from" in custom errors for better tracebacks 2020-08-05 23:53:21 +02:00
Ines Montani 5cc0d89fad
Simplify config overrides in CLI and deserialization (#5880) 2020-08-05 23:35:09 +02:00
Ines Montani 2a1fa86a0d Add better error for failed model shortcut loading 2020-08-05 23:10:29 +02:00
Ines Montani 586d695775 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-05 16:01:11 +02:00
Ines Montani e68459296d Tidy up and auto-format 2020-08-05 16:00:59 +02:00
Matthew Honnibal 50c0e49741 Fix train CLI 2020-08-05 15:40:47 +02:00
Ines Montani b795f02fbd
Allow adding pipeline components from source model (#5857)
* Allow adding pipeline components from source model

* Config: name -> component

* Improve error messages

* Fix error and test

* Add frozen components and exclude logic

* Remove exclude from Language.evaluate

* Init sourced components with current vocab

* Fix error codes
2020-08-04 23:39:19 +02:00
Matthew Honnibal ecb3c4e8f4
Create corpus iterator and batcher from registry during training (#5865)
* Move batchers into their own module (and registry)

* Update CLI

* Update Corpus and batcher

* Update tests

* Update one config

* Merge 'evaluation' block back under [training]

* Import batchers in gold __init__

* Fix batchers

* Update config

* Update schema

* Update util

* Don't assume train and dev are actually paths

* Update onto-joint config

* Fix missing import

* Format

* Format

* Update spacy/gold/corpus.py

Co-authored-by: Ines Montani <ines@ines.io>

* Fix name

* Update default config

* Fix get_length option in batchers

* Update test

* Add comment

* Pass path into Corpus

* Update docstring

* Update schema and configs

* Update config

* Fix test

* Fix paths

* Fix print

* Fix create_train_batches

* [training.read_train] -> [training.train_corpus]

* Update onto-joint config

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-04 15:09:37 +02:00
Ines Montani 934447a611
Merge pull request #5855 from svlandeg/fix/cli-debug 2020-08-03 13:09:20 +02:00
Ines Montani 4c055f0aa7
Add init CLI and init config (#5854)
* Add init CLI and init config draft

* Improve config validation

* Auto-format

* Don't export anything in debug config

* Update docs
2020-08-02 15:18:30 +02:00
svlandeg 6f4e46ee93 Merge remote-tracking branch 'upstream/develop' into fix/cli-debug
# Conflicts:
#	pyproject.toml
#	requirements.txt
#	setup.cfg
2020-08-01 18:38:59 +02:00
svlandeg 9b719dfb1a use divider inbetween steps 2020-07-31 18:06:48 +02:00
svlandeg 51ffc4a166 rename pipe_name to component 2020-07-31 17:58:55 +02:00
svlandeg 878327d38e printing final predictions by default to False 2020-07-31 17:36:32 +02:00
svlandeg cc2f58a1b0 use data_validation context manager 2020-07-31 16:49:42 +02:00
svlandeg 5fa3235d06 set DATA_VALIDATION to False for debug_model (upgrade thinc) 2020-07-31 15:21:01 +02:00
svlandeg 08d3c36c20 bugfix in train CLI 2020-07-31 15:03:43 +02:00
Sofie Van Landeghem ca491722ad
The Parser is now a Pipe (2) (#5844)
* moving syntax folder to _parser_internals

* moving nn_parser and transition_system

* move nn_parser and transition_system out of internals folder

* moving nn_parser code into transition_system file

* rename transition_system to transition_parser

* moving parser_model and _state to ml

* move _state back to internals

* The Parser now inherits from Pipe!

* small code fixes

* removing unnecessary imports

* remove link_vectors_to_models

* transition_system to internals folder

* little bit more cleanup

* newlines
2020-07-30 23:30:54 +02:00
svlandeg 0b23594953 pipe_name instead of section in debug_model 2020-07-30 20:06:28 +02:00
Matthew Honnibal 2af741d7e3 Fix train arg 2020-07-29 14:56:01 +02:00
Matthew Honnibal 1784c95827 Clean up link_vectors_to_models unused stuff 2020-07-29 14:01:11 +02:00
Matthew Honnibal 2aff3c4b5a Load vectors in 'spacy train' 2020-07-29 14:00:13 +02:00
Adriane Boyd 191a12d75f
Fix score_weights typo in train CLI (#5835) 2020-07-29 11:04:12 +02:00
Adriane Boyd 0cddb0dbe9
Move timing into Language.evaluate (#5836)
Move timing into `Language.evaluate` so that only the processing is
timing, not processing + scoring. `Language.evaluate` returns
`scores["speed"]` as words per second, which should be identical to how
the speed was added to the scores previously. Also add the speed to the
evaluate CLI output.
2020-07-29 11:02:31 +02:00
Ines Montani b83ead5bf5
Merge pull request #5824 from svlandeg/fix/textcat-v3 2020-07-28 15:04:25 +02:00
Ines Montani 06a97a8766 Support --opt=value format in CLI config overrides 2020-07-28 13:43:15 +02:00
Ines Montani 0094cb0d04 Remove scores list from config and document 2020-07-28 11:22:24 +02:00