Commit Graph

13128 Commits

Author SHA1 Message Date
Ines Montani 5497acf49a Support config overrides via environment variables 2020-09-21 11:25:10 +02:00
Ines Montani 1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Ines Montani 9d32cac736 Update docs [ci skip] 2020-09-21 10:55:36 +02:00
Ines Montani b9d2b29684 Update docs [ci skip] 2020-09-20 17:49:09 +02:00
Ines Montani 012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani b2302c0a1c Improve error for missing dependency 2020-09-20 17:44:51 +02:00
Ines Montani 6898b35028
Merge pull request #6094 from explosion/bugfix/run_process 2020-09-20 16:49:30 +02:00
Ines Montani 744f259b9c Update landing [ci skip] 2020-09-20 16:37:23 +02:00
Matthew Honnibal 8fb59d958c Format 2020-09-20 16:31:48 +02:00
Matthew Honnibal dc22771f87 Fix sparse checkout 2020-09-20 16:30:05 +02:00
Matthew Honnibal a0fb5e50db Use simple git clone call if not sparse 2020-09-20 16:22:04 +02:00
Matthew Honnibal 2c24d633d0 Use updated run_command 2020-09-20 16:21:43 +02:00
Matthew Honnibal 889128e5c5 Improve error handling in run_command 2020-09-20 16:20:57 +02:00
Ines Montani 554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Ines Montani e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2 2020-09-19 12:33:38 +02:00
Sofie Van Landeghem 39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
Adriane Boyd 47080fba98 Minor renaming / refactoring
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani 0406200a1e Update docs [ci skip] 2020-09-18 15:13:13 +02:00
Ines Montani a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
Matthew Honnibal bbdb5f62b7
Temporary work-around for scoring a subset of components (#6090)
* Try hacking the scorer to work around sentence boundaries

* Upd scorer

* Set dev version

* Upd scorer hack

* Fix version

* Improve comment on hack
2020-09-18 14:26:42 +02:00
Ines Montani d32ce121be Fix docs [ci skip] 2020-09-18 13:41:12 +02:00
Adriane Boyd a88106e852
Remove W106: HEAD and SENT_START in doc.from_array (#6086)
* Remove W106: HEAD and SENT_START in doc.from_array

This warning was hacky and being triggered too often.

* Fix test
2020-09-18 03:01:29 +02:00
Ines Montani 9062585a13
Merge pull request #6087 from explosion/docs/pretrain-usage [ci skip] 2020-09-17 19:25:24 +02:00
Ines Montani a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal 6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Sofie Van Landeghem ed0fb034cb
ml_datasets v0.2.0a0 2020-09-17 18:11:10 +02:00
Ines Montani 1bb8b4f824 Merge branch 'master' into develop 2020-09-17 17:46:20 +02:00
Ines Montani 6bd0d25fb9
Merge pull request #6085 from explosion/docs/static-vectors-intro [ci skip] 2020-09-17 17:14:45 +02:00
Ines Montani a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Ines Montani 2c80f41852
Merge pull request #6084 from svlandeg/feature/init-config-pretrain [ci skip] 2020-09-17 16:59:14 +02:00
Ines Montani 2e3ce9f42f Merge branch 'feature/init-config-pretrain' of https://github.com/svlandeg/spaCy into pr/6084 2020-09-17 16:58:49 +02:00
Ines Montani 3d8e010655 Change order 2020-09-17 16:58:46 +02:00
Ines Montani c4b414b282
Update website/docs/api/cli.md 2020-09-17 16:58:09 +02:00
Ines Montani 3865214343 Use consistent shortcut 2020-09-17 16:57:02 +02:00
Sofie Van Landeghem e5ceec5df0
Update website/docs/api/cli.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-09-17 16:56:20 +02:00
Sofie Van Landeghem 127ce0c574
Update website/docs/api/cli.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-09-17 16:55:53 +02:00
Matthew Honnibal ec751068f3 Draft text for static vectors intro 2020-09-17 16:42:53 +02:00
svlandeg 5fade4feb7 fix cli abbrev 2020-09-17 16:15:20 +02:00
svlandeg ddfc1fc146 add pretraining option to init config 2020-09-17 16:05:40 +02:00
svlandeg 3a3110ef60 remove empty files 2020-09-17 15:44:11 +02:00
svlandeg c8c84f1ccd Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 15:43:04 +02:00
svlandeg 130ffa5fbf fix typos in docs 2020-09-17 14:59:41 +02:00
Matthew Honnibal b57ce9a875 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-17 13:59:25 +02:00
Matthew Honnibal 30e85b2a42 Remove outdated configs 2020-09-17 13:59:12 +02:00
Ines Montani c8fa2247e3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-17 12:34:15 +02:00
Ines Montani 6761028c6f Update docs [ci skip] 2020-09-17 12:34:11 +02:00
svlandeg 427dbecdd6 cleanup and formatting 2020-09-17 11:48:04 +02:00
svlandeg 0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg 8cedb2f380 Merge branch 'fix/corpus' of https://github.com/svlandeg/spaCy into fix/corpus 2020-09-17 09:27:55 +02:00