Commit Graph

420 Commits

Author SHA1 Message Date
Ines Montani f758804401 Save one line of code 2020-10-03 11:41:28 +02:00
svlandeg 02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
svlandeg acc391c2a8 remove redundant str() call 2020-10-02 11:05:59 +02:00
Ines Montani 01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Ines Montani 23c63eefaf Tidy up env vars [ci skip] 2020-09-30 15:15:11 +02:00
Ines Montani a5debb356d Tidy up and adjust logging [ci skip] 2020-09-30 01:22:08 +02:00
Ines Montani da30bae8a6 Use __pyx_vtable__ instead of __reduce_cython__ 2020-09-29 22:04:17 +02:00
Ines Montani fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani 2be80379ec Fix small issues, resolve_dot_names and debug model 2020-09-29 20:38:35 +02:00
Ines Montani dba26186ef Handle None default args in Cython methods 2020-09-29 18:08:02 +02:00
Ines Montani 9353a82076 Auto-format 2020-09-29 18:07:48 +02:00
Matthew Honnibal 4ad26f4a2f Move reader 2020-09-29 16:54:53 +02:00
Ines Montani 2e9c9e74af Fix config resolution and interpolation
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
2020-09-28 15:34:00 +02:00
Ines Montani 02838a1d47 Fix resolve_dot_names 2020-09-28 15:27:10 +02:00
Ines Montani 822ea4ef61 Refactor CLI 2020-09-28 15:09:59 +02:00
Ines Montani e44a7519cd Update CLI and add [initialize] block 2020-09-28 11:56:14 +02:00
Ines Montani d5155376fd Update vocab init 2020-09-28 11:30:18 +02:00
Matthew Honnibal 65448b2e34 Remove schema=None until Optional 2020-09-28 03:42:58 +02:00
Matthew Honnibal a023cf3ecc Add (untested) resolve_dot_names util 2020-09-28 03:06:12 +02:00
Matthew Honnibal a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Ines Montani 9016d23cc5 Fix exclude and add test 2020-09-27 23:34:03 +02:00
Ines Montani 7e938ed63e Update config resolution to use new Thinc 2020-09-27 22:21:31 +02:00
Ines Montani 26e28ed413 Fix combined scores if multiple components report it 2020-09-24 17:11:13 +02:00
Ines Montani d0ef4a4cf5 Prevent division by zero in score weights 2020-09-24 16:42:13 +02:00
Ines Montani 4bbe41f017 Fix combined scores and update test 2020-09-24 10:42:47 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani f25f05c503 Adjust sort order [ci skip] 2020-09-23 20:03:04 +02:00
Matthew Honnibal 8fb59d958c Format 2020-09-20 16:31:48 +02:00
Matthew Honnibal 889128e5c5 Improve error handling in run_command 2020-09-20 16:20:57 +02:00
Adriane Boyd 47080fba98 Minor renaming / refactoring
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani c052017025 Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
Ines Montani 416deb412f Prevent duplicate traceback on CalledProcessError [ci skip] 2020-09-13 19:28:54 +02:00
Ines Montani f8846c198d Update types and docstrings 2020-09-13 10:52:02 +02:00
Ines Montani 3e83a509bb WIP: fix project clone compatibility 2020-09-10 15:49:13 +02:00
Matthew Honnibal b470062153
Add CLI registry (#6037) 2020-09-08 15:23:34 +02:00
Ines Montani 5afe6447cd registry.assets -> registry.misc 2020-09-03 17:31:14 +02:00
Ines Montani 45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components 2020-08-29 15:58:41 +02:00
Ines Montani 34146750d4 Use frozen list with custom errors
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani 5de3f8604d
Update spacy/util.py
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-29 13:17:06 +02:00
Ines Montani cad988da7f Allow component decorators to re-run with same function 2020-08-28 16:27:22 +02:00
Ines Montani 3ce5be4b76 Allow loaded but disabled components 2020-08-28 15:20:14 +02:00
Sofie Van Landeghem 79d460e3a2
Weights & Biases logger for train CLI (#5971)
* quick test as part of train script

* train_logger in config, default ConsoleLogger in loggers catalogue

* entitiy typo

* add wandb_logger

* cleanup

* Update spacy/cli/train_logger.py

Co-authored-by: Ines Montani <ines@ines.io>

* move loggers to gold.loggers

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-26 15:24:33 +02:00
Matthew Honnibal 77852d2428 Fix run_command for python 3.6 2020-08-26 05:02:43 +02:00
Matthew Honnibal 884cac5fb5 Make run_command backwards compatible 2020-08-26 04:33:42 +02:00
Matthew Honnibal 2771e4f2b3
Fix the git "sparse checkout" functionality (#5973)
* Fix the git sparse checkout functionality

* Format
2020-08-26 04:00:14 +02:00
Matthew Honnibal e559867605
Allow spacy project to push and pull to/from remote storage (#5949)
* Add utils for working with remote storage

* WIP add remote_cache for project

* WIP add push and pull commands

* Use pathy in remote_cache

* Updarte util

* Update remote_cache

* Update util

* Update project assets

* Update pull script

* Update push script

* Fix type annotation in util

* Work on remote storage

* Remove site and env hash

* Fix imports

* Fix type annotation

* Require pathy

* Require pathy

* Fix import

* Add a util to handle project variable substitution

* Import push and pull commands

* Fix pull command

* Fix push command

* Fix tarfile in remote_storage

* Improve printing

* Fiddle with status messages

* Set version to v3.0.0a9

* Draft docs for spacy project remote storages

* Update docs [ci skip]

* Use Thinc config to simplify and unify template variables

* Auto-format

* Don't import Pathy globally for now

Causes slow and annoying Google Cloud warning

* Tidy up test

* Tidy up and update tests

* Update to latest Thinc

* Update docs

* variables -> vars

* Update docs [ci skip]

* Update docs [ci skip]

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Ines Montani 1c3bcfb488 Update docs and util consistency 2020-08-18 01:22:59 +02:00
Ines Montani 3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00