Commit Graph

973 Commits

Author SHA1 Message Date
Ines Montani 88e54caa12 accuracy -> performance 2020-09-24 14:32:35 +02:00
Ines Montani be56c0994b Add [training.before_to_disk] callback 2020-09-24 12:40:25 +02:00
Ines Montani c6c67b606e
Merge pull request #6133 from explosion/fix/score_weights 2020-09-24 12:00:57 +02:00
Ines Montani f69fea8b25 Improve error handling around non-number scores 2020-09-24 11:29:07 +02:00
Matthew Honnibal 17a6b0a173
Make project pull order insensitive (#6131) 2020-09-24 10:30:42 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
svlandeg 35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
svlandeg 6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani 7745d77a38 Fix whitespace in template [ci skip] 2020-09-23 13:21:42 +02:00
svlandeg 6435458d51 simplify expression 2020-09-23 12:12:38 +02:00
svlandeg 20b0ec5dcf avoid logging performance of frozen components 2020-09-23 10:37:12 +02:00
Ines Montani 6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
Ines Montani 888f936a73
Merge pull request #6106 from svlandeg/feature/textcat-quickstart 2020-09-23 10:11:45 +02:00
Ines Montani 60a317520a
Merge pull request #6109 from svlandeg/feature/2rename 2020-09-23 09:47:12 +02:00
svlandeg 556f3e4652 add pooling to NEL's TransformerListener 2020-09-23 09:24:28 +02:00
Sofie Van Landeghem 86a08f819d
tok2vec.update instead of predict (#6113) 2020-09-22 21:54:52 +02:00
Ines Montani 5e3b796b12 Validate section refs in debug config 2020-09-22 12:24:39 +02:00
svlandeg 085a1c8e2b add no_output_layer to TextCatBOW config 2020-09-22 12:06:40 +02:00
svlandeg b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00
svlandeg e931f4d757 add textcat score 2020-09-22 10:56:43 +02:00
svlandeg 396b33257f add entity_linker to jinja template 2020-09-22 10:40:05 +02:00
svlandeg 135de82a2d add textcat to quickstart 2020-09-22 10:22:06 +02:00
Ines Montani 6316d5f398 Improve messages in project CLI [ci skip] 2020-09-22 09:45:34 +02:00
Ines Montani 81606b29bd
Merge pull request #6104 from svlandeg/fix/debug_model [ci skip] 2020-09-22 09:31:23 +02:00
svlandeg 45b29c4a5b cleanup 2020-09-21 23:17:23 +02:00
svlandeg fa5c416db6 initialize through nlp object and with train_corpus 2020-09-21 23:09:22 +02:00
svlandeg 447b3e5787 Merge remote-tracking branch 'upstream/develop' into fix/debug_model
# Conflicts:
#	spacy/cli/debug_model.py
2020-09-21 16:58:40 +02:00
Ines Montani e8bcaa44f1 Don't auto-decompress archives with smart_open [ci skip] 2020-09-21 16:01:46 +02:00
svlandeg eb9b447960 Merge remote-tracking branch 'upstream/develop' into fix/debug_model
# Conflicts:
#	spacy/cli/debug_model.py
2020-09-21 14:05:16 +02:00
Ines Montani 758ead8a47 Sync overrides with CLI overrides 2020-09-21 12:50:13 +02:00
Ines Montani 5497acf49a Support config overrides via environment variables 2020-09-21 11:25:10 +02:00
Ines Montani 1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Ines Montani b2302c0a1c Improve error for missing dependency 2020-09-20 17:44:51 +02:00
Matthew Honnibal 8fb59d958c Format 2020-09-20 16:31:48 +02:00
Matthew Honnibal dc22771f87 Fix sparse checkout 2020-09-20 16:30:05 +02:00
Matthew Honnibal a0fb5e50db Use simple git clone call if not sparse 2020-09-20 16:22:04 +02:00
Matthew Honnibal 2c24d633d0 Use updated run_command 2020-09-20 16:21:43 +02:00
Ines Montani 554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
svlandeg 6db1d5dc0d trying some stuff 2020-09-19 19:11:30 +02:00
Ines Montani e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2 2020-09-19 12:33:38 +02:00
Sofie Van Landeghem 39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg 73ff52b9ec hack for tok2vec listener 2020-09-18 16:43:15 +02:00
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
svlandeg e4fc7e0222 fixing output sample to proper 2D array 2020-09-17 22:34:36 +02:00
Ines Montani 3865214343 Use consistent shortcut 2020-09-17 16:57:02 +02:00
svlandeg 35a3931064 fix typo 2020-09-17 16:36:27 +02:00
svlandeg ddfc1fc146 add pretraining option to init config 2020-09-17 16:05:40 +02:00
svlandeg 427dbecdd6 cleanup and formatting 2020-09-17 11:48:04 +02:00
svlandeg 0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg 51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Ines Montani 9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem 3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani c052017025 Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
Matthew Honnibal 54c40223a1
Improve v3 pretrain command (#6040)
* Starts to run

* Update pretrain script

* Update corpus

* Update pretrain schema

* Remove outdated test

* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani febb99916d Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
Ines Montani a5633b205f Fix handling of errors around git [ci skip] 2020-09-13 10:52:28 +02:00
Ines Montani f8846c198d Update types and docstrings 2020-09-13 10:52:02 +02:00
Matthew Honnibal 37347830d4 Fix reading in GloVe vectors 2020-09-12 17:31:18 +02:00
Ines Montani b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config 2020-09-12 17:12:35 +02:00
Ines Montani eedaaaec75 Fix handling of existing asset without checksum [ci skip] 2020-09-12 17:02:53 +02:00
svlandeg a75cfe0da6 Merge remote-tracking branch 'upstream/develop' into feature/cli-config 2020-09-12 14:44:40 +02:00
svlandeg 115147804a string_to_list to parse comma-separated string into a list 2020-09-12 14:43:22 +02:00
Ines Montani f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat 2020-09-12 10:30:49 +02:00
Ines Montani 0b2e07215d Support overwriting name on spacy package 2020-09-11 11:38:28 +02:00
svlandeg 5b94aeece9 support pipeline as "list in string" 2020-09-11 11:08:46 +02:00
Ines Montani 1bce432b4a Adjust message [ci skip] 2020-09-11 10:00:49 +02:00
Ines Montani 5acd4fbcd8 Merge branch 'develop' into fix/clone-compat 2020-09-11 09:58:30 +02:00
Ines Montani 761bd60d43 Adjust info message 2020-09-11 09:57:00 +02:00
Ines Montani 6831161bfa Resolve path to be extra sure 2020-09-11 09:56:49 +02:00
svlandeg 1723fb73c4 remove brol 2020-09-10 17:44:59 +02:00
svlandeg 08a831ce83 process trailing slash if any 2020-09-10 17:39:52 +02:00
Ines Montani 3e83a509bb WIP: fix project clone compatibility 2020-09-10 15:49:13 +02:00
svlandeg f1bc09c1e9 restore partly 2020-09-10 14:53:02 +02:00
svlandeg 3889747119 asset fix & UX 2020-09-10 14:36:53 +02:00
svlandeg a36766d153 hookup branch 2020-09-10 12:00:34 +02:00
svlandeg 97d99f7efa Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes 2020-09-10 11:51:34 +02:00
Ines Montani 908f3a4494 Update default projects repo [ci skip] 2020-09-10 11:42:14 +02:00
svlandeg 92f9d2f406 small UX fixes 2020-09-10 11:35:50 +02:00
svlandeg 1fc5486792 more fine-grained errors for git_sparse_checkout 2020-09-10 11:31:32 +02:00
Ines Montani 15bc3a37b4 Add --branch to project clone 2020-09-10 11:08:15 +02:00
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Sofie Van Landeghem 60f22e1800
Pipe API (#6034)
* ensure Language passes on valid examples for initialization

* fix tagger model initialization

* check for valid get_examples across components

* assume labels were added before begin_training

* fix senter initialization

* fix morphologizer initialization

* use methods to check arguments

* test textcat init, requires thinc>=8.0.0a31

* fix tok2vec init

* fix entity linker init

* use islice

* fix simple NER

* cleanup debug model

* fix assert statements

* fix tests

* throw error when adding a label if the output layer can't be resized anymore

* fix test

* add failing test for simple_ner

* UX improvements

* morphologizer UX

* assume begin_training gets a representative set and processes the labels

* remove assumptions for output of untrained NER model

* restore test for original purpose
2020-09-08 22:44:25 +02:00
Matthew Honnibal ba5f4c9b32 Add words and seconds to train info 2020-09-08 15:24:47 +02:00
Matthew Honnibal b470062153
Add CLI registry (#6037) 2020-09-08 15:23:34 +02:00
Matthew Honnibal 4b7abaafdb Fix learn rate for non-transformer 2020-09-04 21:22:50 +02:00
Matthew Honnibal 465785a672 Fix project pull and push 2020-09-04 21:15:55 +02:00
Ines Montani ab1bb421ed Update docs links in codebase 2020-09-04 12:58:50 +02:00
Ines Montani 2189046869
Merge pull request #6024 from explosion/chore/registry-renaming 2020-09-04 10:54:10 +02:00
Matthew Honnibal 1c07820681 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-03 18:54:21 +02:00
Matthew Honnibal 7be8a0516a Fix project pull 2020-09-03 18:54:03 +02:00
Ines Montani 23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani c063e55eb7 Add prefix to batchers 2020-09-03 17:30:41 +02:00
Ines Montani c53b1433b9 Adjust more arguments [ci skip] 2020-09-03 17:12:24 +02:00
Ines Montani b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Matthew Honnibal 122cb02001 Fix averages 2020-09-02 19:37:43 +02:00
Sofie Van Landeghem 6bfb1b3a29
Fix sparse checkout for 'spacy project' (#6008)
* exit if cloning fails

* UX

* rewrite http link to git protocol, don't use stdin

* fixes to sparse checkout

* formatting
2020-09-01 19:49:01 +02:00
Ines Montani 70b226f69d Support ignore marker in project document [ci skip] 2020-09-01 12:49:04 +02:00
Ines Montani a4c51f0f18 Add v3 info to project docs [ci skip] 2020-09-01 12:36:21 +02:00