Commit Graph

550 Commits

Author SHA1 Message Date
Adriane Boyd 6ee6e41234 Update docstring for Language.evaluate 2020-12-09 10:21:39 +01:00
Adriane Boyd fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani 1980203229 Merge branch 'master' into pr/6444 2020-12-09 11:09:40 +11:00
Adriane Boyd e931d3f72b
Move max_length to nlp.make_doc() (#6512)
Move max_length check to `nlp.make_doc()` so that's it's also checked
for `nlp.pipe()`.
2020-12-08 14:24:02 +08:00
Ines Montani 539b0c10da Tidy up and auto-format 2020-10-10 19:14:48 +02:00
svlandeg 8316bc7d4a bugfix DisabledPipes 2020-10-09 12:06:20 +02:00
Sofie Van Landeghem d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
svlandeg 3e2e1fd323 cleanup 2020-10-08 10:37:32 +02:00
svlandeg eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
svlandeg 6b8bdb2d39 add init_config to nlp.create_pipe 2020-10-07 14:58:16 +02:00
svlandeg ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
svlandeg 4e3ace4b8c is_trainable method 2020-10-05 17:43:42 +02:00
svlandeg 65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
Ines Montani 8f018e47f8 Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe 2020-10-04 14:43:45 +02:00
Ines Montani ae15c9de79 Raise error from caught KeyError to preserve traceback 2020-10-03 11:43:56 +02:00
Stanislav Schmidt 3589a64d44
Change type of texts argument in pipe to iterable (#6186)
* Change type of texts argument in pipe to iterable

* Add contributor agreement
2020-10-02 21:00:11 +02:00
svlandeg 02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
svlandeg 6787e56315 print debugging warning before raising error if model not properly initialized 2020-10-01 09:21:00 +02:00
Ines Montani fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani 798040bc1d Fix language detection 2020-09-29 21:08:13 +02:00
Matthew Honnibal 8ce9f44433 Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare 2020-09-29 16:57:38 +02:00
Matthew Honnibal ca72608059 Fix language 2020-09-29 16:48:33 +02:00
Ines Montani fd594cfb9b Tighten up format 2020-09-29 16:47:55 +02:00
Ines Montani 63d1598137 Simplify config use in Language.initialize 2020-09-29 16:05:48 +02:00
Ines Montani adca08a12f Pass nlp forward 2020-09-29 12:21:52 +02:00
Ines Montani 42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Matthew Honnibal 9c8b2524fe Upd initialize args 2020-09-29 12:08:37 +02:00
Matthew Honnibal f2d1b7feb5 Clean up sgd 2020-09-29 12:00:08 +02:00
Ines Montani 78396d137f Integrate initialize settings 2020-09-29 11:57:08 +02:00
Ines Montani dec984a9c1 Update Language.initialize and support components/tokenizer settings 2020-09-29 11:52:45 +02:00
Matthew Honnibal 5276db6f3f Remove 'device' argument from Language, clean up 'sgd' arg 2020-09-29 11:42:19 +02:00
Ines Montani ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani 658fad428a Fix base schema integration 2020-09-27 22:50:36 +02:00
Ines Montani 7e938ed63e Update config resolution to use new Thinc 2020-09-27 22:21:31 +02:00
Ines Montani 4bbe41f017 Fix combined scores and update test 2020-09-24 10:42:47 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani 1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Adriane Boyd 47080fba98 Minor renaming / refactoring
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Matthew Honnibal c776594ab1 Fix 2020-09-16 18:15:14 +02:00
Matthew Honnibal 4a573d18b3 Add comment 2020-09-16 17:51:29 +02:00
Matthew Honnibal d31afc8334 Fix Language.link_components when model is None 2020-09-16 17:49:48 +02:00
Ines Montani aaf01689a1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-15 14:24:42 +02:00
Ines Montani 91a6637f74 Remove extra pipe config values before merging 2020-09-15 14:24:17 +02:00
Ines Montani d3d7f92f05 Fix lang check and error handling in Language.from_config 2020-09-15 14:24:06 +02:00
Ines Montani 253ba5ef14 Raise for bad Vocab values 2020-09-15 13:25:34 +02:00
Ines Montani 7dfc4bc062 Allow overriding meta from spacy.blank 2020-09-15 11:12:12 +02:00
Matthew Honnibal b693d2d224 Fix speed report in table 2020-09-13 17:39:31 +02:00
Ines Montani febb99916d Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
Sofie Van Landeghem e92e850c72
Raise if empty examples (#6052)
* raise error if no valid Example objects were found during initialization

* fix max_length parameter

* remove commit from other branch

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-12 21:01:53 +02:00