Commit Graph

58 Commits

Author SHA1 Message Date
ines a7b9074b4c Update textcat training example and docs 2017-10-27 00:48:45 +02:00
ines b61866a2e4 Update textcat example 2017-10-27 00:32:19 +02:00
ines f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines f57043e6fe Update docstring 2017-10-26 16:29:08 +02:00
ines b90e958975 Update tagger and parser examples and add to docs 2017-10-26 16:27:42 +02:00
ines f1529463a8 Update tagger training example 2017-10-26 16:19:02 +02:00
ines e44bbb5361 Remove old example 2017-10-26 16:12:41 +02:00
ines 421c3837e8 Fix formatting 2017-10-26 16:11:25 +02:00
ines 4d896171ae Use plac annotations for arguments 2017-10-26 16:11:20 +02:00
ines c3b681e5fb Use plac annotations for arguments and add n_iter 2017-10-26 16:11:05 +02:00
ines bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines b5c74dbb34 Update parser training example 2017-10-26 15:15:37 +02:00
ines 586b9047fd Use create_pipe instead of importing the entity recognizer 2017-10-26 15:15:26 +02:00
ines d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines 9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines e904075f35 Remove stray print statements 2017-10-26 14:24:00 +02:00
ines c30258c3a2 Remove old example 2017-10-26 14:23:52 +02:00
ines 615c315d70 Update train_new_entity_type example to use disable_pipes 2017-10-25 14:56:53 +02:00
ines 2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines 9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00
ines 6675755005 Add training data JSON example 2017-10-24 12:05:10 +02:00
Jeroen Bobbeldijk 84c6c20d1c Fix #1444: fix pipeline logic and wrong paramater in update call 2017-10-22 15:18:36 +02:00
Jeffrey Gerard 5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Jeffrey Gerard 39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00
Matthew Honnibal 79a94bc166 Update textcat exampe 2017-10-04 14:55:30 +02:00
Matthew Honnibal cbb1fbef80 Update train_ner_standalone example 2017-10-03 18:49:38 +02:00
Matthew Honnibal 027a5d8b75 Update train_ner_standalone example 2017-09-15 10:36:46 +02:00
Matthew Honnibal 683d81bb49 Update example for adding entity type 2017-09-14 16:15:59 +02:00
Matthew Honnibal c16ef0a85c Clarify train textcat example 2017-07-29 21:59:27 +02:00
Matthew Honnibal 54a539a113 Finish text classifier example 2017-07-23 00:34:12 +02:00
Matthew Honnibal 2bc7d87c70 Add example for training text classifier 2017-07-22 20:15:32 +02:00
ines 992559bf9a Fix formatting and remove unused imports 2017-06-01 12:47:18 +02:00
Matthew Honnibal 5c30466c95 Update NER training example 2017-05-31 13:42:12 +02:00
Matthew Honnibal 2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Matthew Honnibal 0605b95f2e Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-18 13:48:00 +02:00
Matthew Honnibal 2f84626417 Fix train_new_entity_type example 2017-04-18 13:47:36 +02:00
Ines Montani 734b0a4e4a Update train_new_entity_type.py 2017-04-16 23:42:16 +02:00
ines 264af6cd17 Add documentation 2017-04-16 20:37:46 +02:00
ines c7adca58a9 Tidy up example and only save/test if output_directory is not None 2017-04-16 16:55:01 +02:00
Matthew Honnibal 40e3024241 Move standalone NER training script into examples directory 2017-04-15 16:13:42 +02:00
Matthew Honnibal c729d72fc6 Add new example for training new entity types 2017-04-15 16:11:06 +02:00
Matthew Honnibal 97b83c74dc WIP on training example 2017-04-14 23:54:27 +02:00
Matthew Honnibal ab70f6e18d Update NER training example 2017-01-27 12:27:10 +01:00
Christos Savvopoulos c19b83f6ae use model_dir inside of load_model 2016-12-12 20:23:24 +00:00
Christos Savvopoulos 93cf4af701 actually commit load_ner.py 2016-12-12 20:13:33 +00:00
Christos Savvopoulos ad54a929f8 train_ner should save vocab; add load_ner example 2016-12-12 20:09:49 +00:00
kendricktan ba8841234a Fixed training examples
Changes:
1. train_ner won't crash if no data directory is not found
2. Fixed train_tagger expected spacy.gold.GoldParse, got list
2016-10-24 16:09:23 +10:00
kendricktan 9877f3298f updated training examples to v1.1.2 2016-10-24 11:53:33 +10:00