Commit Graph

82 Commits

Author SHA1 Message Date
mpuels ee4d6fdd40
Fix typo in comment 2017-12-09 13:14:57 +01:00
ines 726fb2d0b5 Use fewer iterations by default to avoid overfitting on blank model (resolves #1632) 2017-11-23 15:27:12 +01:00
ines ec08996000 Add note on tags matching tokenization (see #1613) 2017-11-20 15:12:47 +01:00
ines f36fab39b0 Don't rename component in intent parser example (resolves #1551)
Otherwise, the default saved model won't know that it's supposed to create spaCy's 'parser'.
2017-11-10 23:35:38 +01:00
Ines Montani 1a23a0f87e
Remove broken link (resolves #1541) 2017-11-10 12:28:39 +01:00
ines 89bd40b821 Fix print statement in textcat training example (resolves #1515) 2017-11-08 17:17:40 +01:00
ines a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
ines 173b1551af Update examples 2017-11-07 01:22:30 +01:00
ines 1b1c9105b4 Update example compatibility statements 2017-11-07 01:11:45 +01:00
ines 8fb48b9b91 Update and document new util functions 2017-11-07 00:22:43 +01:00
Matthew Honnibal d7016d4050 Update intent parser example 2017-11-06 23:31:11 +01:00
ines fe498b3d5e Update training examples to use "simple style" 2017-11-06 23:14:04 +01:00
ines 2dca9e71a1 Add notes on catastrophic forgetting (see #1496) 2017-11-06 13:17:02 +01:00
Matthew Honnibal e033162a1d Update tagger training example 2017-11-01 21:49:08 +01:00
ines 8f1d3fc3ee Update textcat example 2017-11-01 17:09:22 +01:00
Matthew Honnibal dad8f09fba Fix print statements in text classifier example 2017-11-01 16:34:31 +01:00
ines bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines 4b196fdf7f Fix formatting 2017-11-01 00:43:22 +01:00
ines 33af6ac69a Use even smaller examle size
100 was still too much, so try 20 instead
2017-10-30 19:46:45 +01:00
ines f02b0af821 Fix path and use smaller example size
500 was too larger and caused laggy rendering
2017-10-30 19:44:35 +01:00
ines 18dde7869a Update training data docs and add vocab JSONL 2017-10-30 19:40:05 +01:00
ines b5643d8575 Update intent parser docs and add to usage docs 2017-10-27 04:49:05 +02:00
ines 9dfca0f2f8 Add example for custom intent parser 2017-10-27 03:55:11 +02:00
ines 4d272e25ee Fix examples 2017-10-27 03:55:04 +02:00
ines a7b9074b4c Update textcat training example and docs 2017-10-27 00:48:45 +02:00
ines b61866a2e4 Update textcat example 2017-10-27 00:32:19 +02:00
ines f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines f57043e6fe Update docstring 2017-10-26 16:29:08 +02:00
ines b90e958975 Update tagger and parser examples and add to docs 2017-10-26 16:27:42 +02:00
ines f1529463a8 Update tagger training example 2017-10-26 16:19:02 +02:00
ines e44bbb5361 Remove old example 2017-10-26 16:12:41 +02:00
ines 421c3837e8 Fix formatting 2017-10-26 16:11:25 +02:00
ines 4d896171ae Use plac annotations for arguments 2017-10-26 16:11:20 +02:00
ines c3b681e5fb Use plac annotations for arguments and add n_iter 2017-10-26 16:11:05 +02:00
ines bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines b5c74dbb34 Update parser training example 2017-10-26 15:15:37 +02:00
ines 586b9047fd Use create_pipe instead of importing the entity recognizer 2017-10-26 15:15:26 +02:00
ines d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines 9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines e904075f35 Remove stray print statements 2017-10-26 14:24:00 +02:00
ines c30258c3a2 Remove old example 2017-10-26 14:23:52 +02:00
ines 615c315d70 Update train_new_entity_type example to use disable_pipes 2017-10-25 14:56:53 +02:00
ines 2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines 9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00
ines 6675755005 Add training data JSON example 2017-10-24 12:05:10 +02:00
Jeroen Bobbeldijk 84c6c20d1c Fix #1444: fix pipeline logic and wrong paramater in update call 2017-10-22 15:18:36 +02:00
Jeffrey Gerard 5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Jeffrey Gerard 39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00