Commit Graph

6883 Commits

Author SHA1 Message Date
ines b39409173e Add disable option and True/False/None values for pipeline 2017-10-07 00:29:08 +02:00
ines 2586b61b15 Fix formatting, tidy up and remove unused imports 2017-10-07 00:26:05 +02:00
ines 212c8f0711 Implement new Language methods and pipeline API 2017-10-07 00:25:54 +02:00
Matthew Honnibal 8be46d766e Remove print statement 2017-10-06 16:19:02 -05:00
ines 3468d535ad Update model benchmarks 2017-10-06 21:39:06 +02:00
Matthew Honnibal 8e731009fe Fix parser config serialization 2017-10-06 13:50:52 -05:00
Matthew Honnibal f4c9a98166 Fix spacy evaluate command on non-GPU 2017-10-06 13:17:47 -05:00
Matthew Honnibal 16ba6aa8a6 Fix parser config serialization 2017-10-06 13:17:31 -05:00
ines 96a4e79d13 Fix PhraseMatcher example 2017-10-06 18:22:10 +02:00
Matthew Honnibal c66399d8ae Fix depth definition with history features 2017-10-06 06:20:05 -05:00
Matthew Honnibal 5c750a9c2f Reserve 0 for 'missing' in history features 2017-10-06 06:10:13 -05:00
Matthew Honnibal fbba7c517e Pass dropout through to embed tables 2017-10-06 06:09:18 -05:00
Matthew Honnibal 21d11936fe Fix significant train/test skew error in history feats 2017-10-06 06:08:50 -05:00
Matthew Honnibal 555d8c8bff Fix beam history features 2017-10-05 22:21:50 -05:00
Matthew Honnibal 3db0a32fd6 Fix dropout for history features 2017-10-05 22:21:30 -05:00
Matthew Honnibal b0618def8d Add support for 2-token state option 2017-10-05 21:54:12 -05:00
Matthew Honnibal 363aa47b40 Clean up dead parsing code 2017-10-05 21:53:49 -05:00
Matthew Honnibal ca12764772 Enable history features for beam parser 2017-10-05 21:53:29 -05:00
Matthew Honnibal fc06b0a333 Fix training when hist_size==0 2017-10-05 21:52:28 -05:00
Matthew Honnibal e25ffcb11f Move history size under feature flags 2017-10-05 19:38:13 -05:00
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal c36d4596bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-05 18:27:56 +02:00
Matthew Honnibal 056b08c0df Delete obsolete nn_text_class example 2017-10-05 18:27:10 +02:00
Matthew Honnibal c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal 5743b06e36 Wrap model saving in try/except 2017-10-05 08:12:50 -05:00
Matthew Honnibal fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal dcdfa071aa Disable LayerNorm hack 2017-10-04 20:06:52 -05:00
Matthew Honnibal 943af4423a Make depth setting in parser work again 2017-10-04 20:06:05 -05:00
Matthew Honnibal bfabc333be Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-04 20:00:36 -05:00
Matthew Honnibal 92066b04d6 Fix Embed and HistoryFeatures 2017-10-04 19:55:34 -05:00
ines b621a2e964 Fix build emoji 2017-10-04 18:37:27 +02:00
Matthew Honnibal 5560c46a59 Update buildkite 2017-10-04 18:29:41 +02:00
Matthew Honnibal e3c93f87a4 Update sdist 2017-10-04 18:18:07 +02:00
Matthew Honnibal c4c7def9ce Fix yml 2017-10-04 18:14:33 +02:00
Matthew Honnibal 71825f9737 Fix yml 2017-10-04 18:12:16 +02:00
Matthew Honnibal 6304c5e146 Fix yml 2017-10-04 18:08:34 +02:00
Matthew Honnibal ff24b6d04a Fix yml 2017-10-04 18:05:45 +02:00
Matthew Honnibal cc29e8b497 Add buildkite.yml for making sdists 2017-10-04 18:00:37 +02:00
Matthew Honnibal d903986439 Increment version 2017-10-04 17:14:26 +02:00
Matthew Honnibal fb75eb52f1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-04 16:37:00 +02:00
Matthew Honnibal 40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
ines bb13aa4bf3 Fix typos in PhraseMatcher docs 2017-10-04 16:12:09 +02:00
Matthew Honnibal bd8e84998a Add nO attribute to TextCategorizer model 2017-10-04 16:07:30 +02:00
Matthew Honnibal f8a0614527 Improve textcat model slightly 2017-10-04 15:15:53 +02:00
Matthew Honnibal f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00
Matthew Honnibal 39798b0172 Uncomment layernorm adjustment hack 2017-10-04 15:12:09 +02:00
Matthew Honnibal b3a7082bf8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-04 14:56:46 +02:00
Matthew Honnibal db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal 79a94bc166 Update textcat exampe 2017-10-04 14:55:30 +02:00
Matthew Honnibal 774f5732bd Fix dimensionality of textcat when no vectors available 2017-10-04 14:55:15 +02:00