spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	3836199a83	Fix loading of models when custom vectors are added	2018-04-10 22:19:20 +02:00
ines	5ecb274764	Fix indentation error and set Doc.is_tagged correctly	2018-04-10 16:14:52 +02:00
ines	987ee27af7	Return Doc if noun chunks merger component if Doc is not parsed	2018-04-09 14:51:02 +02:00
ines	e5f47cd82d	Update errors	2018-04-03 21:40:29 +02:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Ines Montani	a609a1ca29	Merge pull request #2152 from explosion/feature/tidy-up-dependencies 💫 Tidy up dependencies	2018-03-29 14:35:09 +02:00
Matthew Honnibal	8308bbc617	Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts	2018-03-29 00:14:55 +02:00
Matthew Honnibal	bc4afa9881	Remove print statement	2018-03-28 17:48:37 +02:00
Matthew Honnibal	9bf6e93b3e	Set pretrained_vectors in begin_training	2018-03-28 16:32:41 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
ines	f3f8bfc367	Add built-in factories for merge_entities and merge_noun_chunks Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).	2018-03-15 17:16:54 +01:00
Matthew Honnibal	8f06903e09	Fix multitask objectives	2018-02-17 18:41:36 +01:00
Matthew Honnibal	d1246c95fb	Fix model loading when using multitask objectives	2018-02-17 18:11:36 +01:00
Matthew Honnibal	3e541de440	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-15 21:02:55 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
Matthew Honnibal	d7c9b53120	Pass kwargs into pipeline components during begin_training	2018-02-12 10:18:39 +01:00
Matthew Honnibal	f3753c2453	Further model deserialization fixes re #1727	2018-01-23 19:16:05 +01:00
Matthew Honnibal	85c942a6e3	Dont overwrite pretrained_dims setting from cfg. Fixes #1727	2018-01-23 19:10:49 +01:00
Matthew Honnibal	203d2ea830	Allow multitask objectives to be added to the parser and NER more easily	2018-01-21 19:37:02 +01:00
Matthew Honnibal	61a051f2c0	Fix MultitaskObjective	2018-01-21 19:21:34 +01:00
Matthew Honnibal	c27c82d5f9	Fix serialization	2017-11-08 13:08:48 +01:00
Matthew Honnibal	072ff38a01	Try to fix python3.5 serialization	2017-11-08 12:10:49 +01:00
Matthew Honnibal	dd90fe09f5	Remove extraneous label from textcat class	2017-11-06 22:09:02 +01:00
Matthew Honnibal	8fea512ac8	Don't set tensor in textcat	2017-11-06 19:20:14 +01:00
Matthew Honnibal	75e1618ec3	Fix lemma clobbering	2017-11-06 16:56:19 +01:00
Matthew Honnibal	25859dbb48	Return optimizer from begin_training, creating if necessary	2017-11-06 14:26:49 +01:00
Matthew Honnibal	31babe3c3f	Fix non-clobbering lemmatization	2017-11-06 12:36:05 +01:00
Matthew Honnibal	2b35bb76ad	Fix tensorizer on GPU	2017-11-05 15:34:40 +01:00
uwol	a2162b8908	tensorizer return parameter fix	2017-11-05 12:25:10 +01:00
Matthew Honnibal	17c63906f9	Update tensorizer component	2017-11-03 20:20:26 +01:00
Matthew Honnibal	6681058abd	Fix tensor extending in tagger	2017-11-03 13:29:36 +01:00
Matthew Honnibal	d6fc39c8a6	Set Doc.tensor from Tagger	2017-11-03 11:20:05 +01:00
Matthew Honnibal	b30dd36179	Allow Tagger.add_label() before training	2017-11-01 21:49:24 +01:00
Matthew Honnibal	b84d99b281	Revert tagger.add_label() changes, to fix model	2017-11-01 21:10:45 +01:00
Matthew Honnibal	f5855e539b	Fix tagger model loading	2017-11-01 20:42:36 +01:00
Matthew Honnibal	190522efd3	Fix tagger when some tags aren't in Morphology	2017-11-01 19:27:49 +01:00
Matthew Honnibal	7ae1aacdb8	Fix add_label methods	2017-11-01 17:06:43 +01:00
Matthew Honnibal	e7a9174877	Add add_label methods to Tagger and TextCategorizer	2017-11-01 16:32:44 +01:00
ines	ba5e646219	Tidy up pipeline	2017-10-27 20:29:08 +02:00
Ines Montani	4033e70c71	Merge pull request #1461 from explosion/feature/disable-pipes 💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples	2017-10-27 12:21:40 +02:00
ines	9e372913e0	Remove old 'SP' condition in tag map	2017-10-26 16:11:57 +02:00
Matthew Honnibal	a8abc47811	Rename BaseThincComponent --> Pipe	2017-10-26 12:40:40 +02:00
Matthew Honnibal	b0f3ea2200	Fix names of pipeline components NeuralDependencyParser --> DependencyParser NeuralEntityRecognizer --> EntityRecognizer TokenVectorEncoder --> Tensorizer NeuralLabeller --> MultitaskObjective	2017-10-26 12:38:23 +02:00
Matthew Honnibal	ed8da9b11f	Add missing return statement in SentenceSegmenter	2017-10-17 15:32:56 +02:00
Matthew Honnibal	09d61ada5e	Merge pull request #1396 from explosion/feature/pipeline-management 💫 Improve pipeline and factory management	2017-10-10 04:29:54 +02:00
Matthew Honnibal	8978212ee5	Patch serialization bug raised in #1105	2017-10-10 03:58:12 +02:00
Matthew Honnibal	0384f08218	Trigger nonproj.deprojectivize as a postprocess	2017-10-07 02:00:47 +02:00
Matthew Honnibal	563f46f026	Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.	2017-10-05 18:43:02 -05:00
Matthew Honnibal	5454b20cd7	Update thinc imports for 6.9	2017-10-03 20:07:17 +02:00
Matthew Honnibal	4a59f6358c	Fix thinc imports	2017-10-03 19:21:26 +02:00

1 2 3 4

169 Commits