Commit Graph

179 Commits

Author SHA1 Message Date
Matthew Honnibal dad8f09fba Fix print statements in text classifier example 2017-11-01 16:34:31 +01:00
ines bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines 0ca152a015 Fix syntax error 2017-11-01 00:43:28 +01:00
ines 4b196fdf7f Fix formatting 2017-11-01 00:43:22 +01:00
ines 33af6ac69a Use even smaller examle size
100 was still too much, so try 20 instead
2017-10-30 19:46:45 +01:00
ines f02b0af821 Fix path and use smaller example size
500 was too larger and caused laggy rendering
2017-10-30 19:44:35 +01:00
ines 18dde7869a Update training data docs and add vocab JSONL 2017-10-30 19:40:05 +01:00
ines b5643d8575 Update intent parser docs and add to usage docs 2017-10-27 04:49:05 +02:00
ines 9dfca0f2f8 Add example for custom intent parser 2017-10-27 03:55:11 +02:00
ines 4d272e25ee Fix examples 2017-10-27 03:55:04 +02:00
ines 44f83b35bc Update pipeline component examples to use plac 2017-10-27 02:58:14 +02:00
ines af28ca1ba0 Move example to pipeline directory 2017-10-27 02:00:01 +02:00
ines 1d69a46cd4 Update multi-processing example and add to docs 2017-10-27 01:58:55 +02:00
ines 4eabaafd66 Update docstring and example 2017-10-27 01:50:44 +02:00
ines ed69bd69f4 Update parallel tagging example 2017-10-27 01:48:52 +02:00
ines 096a80170d Remove old example files 2017-10-27 01:48:39 +02:00
ines a7b9074b4c Update textcat training example and docs 2017-10-27 00:48:45 +02:00
ines b61866a2e4 Update textcat example 2017-10-27 00:32:19 +02:00
ines f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines b7b285971f Update examples README 2017-10-26 18:47:11 +02:00
ines cc2917c9e8 Update fastText example and add to examples in docs 2017-10-26 18:47:02 +02:00
ines db843735d3 Remove outdated examples 2017-10-26 18:46:25 +02:00
ines daed7ff8fe Update information extraction examples 2017-10-26 18:46:11 +02:00
ines bca5372fb1 Clean up examples 2017-10-26 17:32:59 +02:00
ines f57043e6fe Update docstring 2017-10-26 16:29:08 +02:00
ines b90e958975 Update tagger and parser examples and add to docs 2017-10-26 16:27:42 +02:00
ines f1529463a8 Update tagger training example 2017-10-26 16:19:02 +02:00
ines e44bbb5361 Remove old example 2017-10-26 16:12:41 +02:00
ines 421c3837e8 Fix formatting 2017-10-26 16:11:25 +02:00
ines 4d896171ae Use plac annotations for arguments 2017-10-26 16:11:20 +02:00
ines c3b681e5fb Use plac annotations for arguments and add n_iter 2017-10-26 16:11:05 +02:00
ines bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines b5c74dbb34 Update parser training example 2017-10-26 15:15:37 +02:00
ines 586b9047fd Use create_pipe instead of importing the entity recognizer 2017-10-26 15:15:26 +02:00
ines d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines 9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines e904075f35 Remove stray print statements 2017-10-26 14:24:00 +02:00
ines c30258c3a2 Remove old example 2017-10-26 14:23:52 +02:00
ines 615c315d70 Update train_new_entity_type example to use disable_pipes 2017-10-25 14:56:53 +02:00
ines 2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines 9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00
ines 6675755005 Add training data JSON example 2017-10-24 12:05:10 +02:00
Jeroen Bobbeldijk 84c6c20d1c Fix #1444: fix pipeline logic and wrong paramater in update call 2017-10-22 15:18:36 +02:00
Jeffrey Gerard 5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Jeffrey Gerard 39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines f4ae6763b9 Fix consistency of imports from spacy.tokens in examples 2017-10-11 02:30:40 +02:00
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
ines 6679117000 Add pipeline component examples 2017-10-10 04:26:06 +02:00
Matthew Honnibal e79fc41ff8 Merge pull request #1391 from explosion/feature/multilabel-textcat
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00