Commit Graph

275 Commits

Author SHA1 Message Date
Matthew Honnibal a350be0601 Fix vector-name loading fix 2018-04-04 01:31:25 +02:00
Matthew Honnibal 81f4005f3d Fix loading models with pretrained vectors 2018-04-03 23:11:48 +02:00
ines e5f47cd82d Update errors 2018-04-03 21:40:29 +02:00
Ines Montani 3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal f3b7c5e537 Fix syntax error 2018-03-29 21:50:32 +02:00
Matthew Honnibal 23afa6429f Add input length error, to address #1826 2018-03-29 21:45:26 +02:00
Matthew Honnibal a7c5ae2beb Avoid forcing a name on empty vectors, and remove print statement 2018-03-28 21:08:58 +02:00
Matthew Honnibal 95a9615221 Fix loading of multiple pre-trained vectors
This patch addresses #1660, which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.

The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.

In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines f3f8bfc367 Add built-in factories for merge_entities and merge_noun_chunks
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
Aaron Marquez 3765d84d57 Fix issue #1959 2018-02-15 12:51:49 -08:00
Claudiu-Vlad Ursache e28de12cbd
Ensure files opened in `from_disk` are closed
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).
2018-02-13 20:49:43 +01:00
Motoki Wu f4a7d1a423 make to sure pass in **cfg to each component when training 2018-01-30 18:29:54 -08:00
ines 4046823699 Only check component in factories if string (see #1911) 2018-01-30 16:29:07 +01:00
ines ce10d320c4 Fix component check in self.factories (see #1911) 2018-01-30 16:09:37 +01:00
ines 8901814248 Improve error handling if pipeline component is not callable (resolves #1911)
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
ines a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
Ines Montani 6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev 59f03ab1d7 Fixed spacy version string in default meta 2017-11-26 23:02:07 +03:00
Matthew Honnibal 8fec7268eb Move string cleanup under a setting flag 2017-11-23 12:19:18 +00:00
Matthew Honnibal 5949777b12 Fix misleading multi-threading docstring 2017-11-23 12:18:59 +00:00
Roman Domrachev 61d28d03e4 Try again to do selective remove cache 2017-11-15 19:11:12 +03:00
Roman Domrachev 505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev a33d5a068d Try to hold origin data instead of restore it 2017-11-14 22:40:03 +03:00
Roman Domrachev 91e2fa6561 Clean all caches 2017-11-14 21:15:04 +03:00
Roman Domrachev 86ca434c93 Merge github.com:explosion/spaCy 2017-11-14 17:46:22 +03:00
Roman Domrachev a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal dd1678eab3
Edit comment 2017-11-11 18:37:08 +01:00
Roman Domrachev ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev 4a6b094e09 Remove unused import 2017-11-11 03:13:05 +03:00
Roman Domrachev 3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
Matthew Honnibal 45e0617e61 Allow Language.update to take unicode text and dict objects 2017-11-06 22:07:38 +01:00
Matthew Honnibal 5c85bf3791 Fix missing import 2017-11-06 15:06:27 +01:00
Matthew Honnibal 465adfee94 Remove unused resume_training method, and pass optimizer through 2017-11-06 14:26:00 +01:00
Matthew Honnibal 38109a0e4a Register SentenceSegmenter in Language.factories 2017-11-05 18:45:57 +01:00
Matthew Honnibal d185927998 Undo harmful pickling hacks on Language class 2017-11-04 23:07:03 +01:00
Matthew Honnibal 2bf21cbe29 Update model after optimising it instead of waiting 2017-11-03 20:20:01 +01:00
ines 5f661a1b3a Remove tensorizer from pre-set pipe_names 2017-11-01 19:48:33 +01:00
ines bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines 37e62ab0e2 Update vector meta in meta.json 2017-11-01 01:25:09 +01:00
ines 8e02294241 Add vectors to Language.meta 2017-10-30 18:39:48 +01:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines 91899d337b Tidy up language, lemmatizer and scorer 2017-10-27 14:40:14 +02:00
Ines Montani 4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
ines 2d6ec99884 Set 'model' as default model name to prevent meta.json errors 2017-10-26 16:12:23 +02:00
Matthew Honnibal 90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
Matthew Honnibal b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines 1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
ines 6a00de4f77 Fix check of unexpected pipe names in restore() 2017-10-25 14:56:35 +02:00
ines 7f03932477 Return self on __enter__ 2017-10-25 14:56:16 +02:00
Matthew Honnibal e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00