Commit Graph

576 Commits

Author SHA1 Message Date
Adriane Boyd d746ea6278
Add warning about GPU selection in Jupyter notebooks (#7075)
* Initial warning

* Update check

* Redo edit

* Move jupyter warning to helper method

* Add link with details to warnings
2021-03-09 15:35:21 +01:00
Sofie Van Landeghem 39de3602e0
return custom error in nlp.initialize (#7104)
* return custom error in nlp.initialize

* Rename error

Co-authored-by: Ines Montani <ines@ines.io>
2021-03-09 23:01:31 +11:00
Sofie Van Landeghem cd70c3cb79
Fixing pretrain (#7342)
* initialize NLP with train corpus

* add more pretraining tests

* more tests

* function to fetch tok2vec layer for pretraining

* clarify parameter name

* test different objectives

* formatting

* fix check for static vectors when using vectors objective

* clarify docs

* logger statement

* fix init_tok2vec and proc.initialize order

* test training after pretraining

* add init_config tests for pretraining

* pop pretraining block to avoid config validation errors

* custom errors
2021-03-09 14:01:13 +11:00
Adriane Boyd e43d43db32
Allow sourcing disabled components (#7215)
Check `component_names` instead of `pipe_names` to allow sourcing
disabled components.
2021-02-26 13:50:56 +01:00
Sofie Van Landeghem f638306598
remove link_components flag again (#6883) 2021-02-02 10:08:40 +08:00
Sofie Van Landeghem acabb284dd
Fix linking resumed components (#6859)
* link components across enabled, resumed and frozen

* revert renaming

* revert renaming, the sequel
2021-02-01 22:19:58 +11:00
Ines Montani d0c3775712 Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
Ines Montani e6accb3a9e Tidy up and auto-format 2021-01-30 12:52:33 +11:00
Ines Montani 7886d59c56 Add check for remove_listener method 2021-01-29 23:47:30 +11:00
Ines Montani 94232aea08 Improve E889 2021-01-29 23:39:23 +11:00
Ines Montani e766e8c56d
Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-29 21:41:17 +11:00
Ines Montani 325f47500d Move replacement logic to Language.from_config 2021-01-29 19:37:04 +11:00
Ines Montani 99842387cb Remove default value 2021-01-29 18:45:37 +11:00
Ines Montani 44b5542d14 Change method order 2021-01-29 18:42:41 +11:00
Ines Montani 8c15d1daec Update and validate config first and exit early if paths don't exist 2021-01-29 18:24:47 +11:00
Ines Montani bbb94b37c6 Update error handling and docstring 2021-01-29 16:27:49 +11:00
Ines Montani 01ecfbcc45 Merge branch 'develop' into feature/replace-listeners 2021-01-29 15:57:32 +11:00
Ines Montani 911dfcccfc Add option to replace listeners for sourced components 2021-01-29 15:57:04 +11:00
Sofie Van Landeghem 837a4f53c2
Error handling in nlp.pipe (#6817)
* add error handler for pipe methods

* add unit tests

* remove pipe method that are the same as their base class

* have Language keep track of a default error handler

* cleanup

* formatting

* small refactor

* add documentation
2021-01-29 08:51:21 +08:00
Ines Montani fabd3a3394 Tidy up code comments [ci skip] 2021-01-27 12:40:03 +11:00
Sofie Van Landeghem 57640aa838
warn when frozen components break listener pattern (#6766)
* warn when frozen components break listener pattern

* few notes in the documentation

* update arg name

* formatting

* cleanup

* specify listeners return type
2021-01-20 11:12:35 +11:00
Matthew Honnibal 88acbfc050
Copy the Example objects (and their predicted Doc) in nlp.evaluate() and nlp.update() (#6765)
* Make copy of examples in nlp.update and nlp.evaluate

* Avoid circular import

* Fix evaluate
2021-01-19 16:47:44 +01:00
Sofie Van Landeghem bfc212e68f
fix duplicate from merge [ci skip] 2021-01-19 12:14:35 +01:00
Adriane Boyd c8b4370865
Add all strings from source models (#6736)
Add all strings from the source model when adding a pipe from a source
model.

Minor:

* Skip `disable=["vocab", "tokenizer"]` when loading a source model from
the config, since this doesn't do anything and is misleading.
2021-01-16 12:26:15 +11:00
Adriane Boyd a45d89f09a Add initialize.before_init and after_init callbacks
Add `initialize.before_init` and `initialize.after_init` callbacks to
the config. The `initialize.before_init` callback is a place to
implement one-time tokenizer customizations that are then saved with the
model.
2021-01-12 13:07:44 +01:00
Adriane Boyd b57be94c78
Fix memory issues in Language.evaluate (#6386)
* Fix memory issues in Language.evaluate

Reset annotation in predicted docs before evaluating and store all data
in `examples`.

* Minor refactor to docs generator init

* Fix generator expression

* Fix final generator check

* Refactor pipeline loop

* Handle examples generator in Language.evaluate

* Add test with generator

* Use make_doc
2020-12-31 10:45:50 +11:00
Adriane Boyd 6ee6e41234 Update docstring for Language.evaluate 2020-12-09 10:21:39 +01:00
Adriane Boyd fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani 1980203229 Merge branch 'master' into pr/6444 2020-12-09 11:09:40 +11:00
Adriane Boyd e931d3f72b
Move max_length to nlp.make_doc() (#6512)
Move max_length check to `nlp.make_doc()` so that's it's also checked
for `nlp.pipe()`.
2020-12-08 14:24:02 +08:00
Ines Montani 539b0c10da Tidy up and auto-format 2020-10-10 19:14:48 +02:00
svlandeg 8316bc7d4a bugfix DisabledPipes 2020-10-09 12:06:20 +02:00
Sofie Van Landeghem d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
svlandeg 3e2e1fd323 cleanup 2020-10-08 10:37:32 +02:00
svlandeg eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
svlandeg 6b8bdb2d39 add init_config to nlp.create_pipe 2020-10-07 14:58:16 +02:00
svlandeg ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
svlandeg 4e3ace4b8c is_trainable method 2020-10-05 17:43:42 +02:00
svlandeg 65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
Ines Montani 8f018e47f8 Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe 2020-10-04 14:43:45 +02:00
Ines Montani ae15c9de79 Raise error from caught KeyError to preserve traceback 2020-10-03 11:43:56 +02:00
Stanislav Schmidt 3589a64d44
Change type of texts argument in pipe to iterable (#6186)
* Change type of texts argument in pipe to iterable

* Add contributor agreement
2020-10-02 21:00:11 +02:00
svlandeg 02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
svlandeg 6787e56315 print debugging warning before raising error if model not properly initialized 2020-10-01 09:21:00 +02:00
Ines Montani fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani 798040bc1d Fix language detection 2020-09-29 21:08:13 +02:00
Matthew Honnibal 8ce9f44433 Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare 2020-09-29 16:57:38 +02:00
Matthew Honnibal ca72608059 Fix language 2020-09-29 16:48:33 +02:00
Ines Montani fd594cfb9b Tighten up format 2020-09-29 16:47:55 +02:00
Ines Montani 63d1598137 Simplify config use in Language.initialize 2020-09-29 16:05:48 +02:00