Commit Graph

454 Commits

Author SHA1 Message Date
Matthew Honnibal 2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Matthew Honnibal d2436dc17b Update fix for Issue #999 2017-04-23 18:14:37 +02:00
Matthew Honnibal 60703cede5 Ensure noun chunks can't be nested. Closes #955 2017-04-23 17:56:39 +02:00
Matthew Honnibal 4eef200bab Persist the actions within spacy.parser.cfg 2017-04-20 17:02:44 +02:00
Matthew Honnibal 137b210bcf Restore use of FTRL training 2017-04-16 18:02:42 +02:00
Matthew Honnibal 45464d065e Remove print statement 2017-04-15 16:11:43 +02:00
Matthew Honnibal c76cb8af35 Fix training for new labels 2017-04-15 16:11:26 +02:00
Matthew Honnibal 4884b2c113 Refix StepwiseState 2017-04-15 16:00:28 +02:00
Matthew Honnibal 1a98e48b8e Fix Stepwisestate' 2017-04-15 13:35:01 +02:00
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
Matthew Honnibal 354458484c WIP on add_label bug during NER training
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal 49e2de900e Add costs property to StepwiseState, to show which moves are gold. 2017-04-10 11:37:04 +02:00
Matthew Honnibal cc36c308f4 Fix noun_chunk rules around coordination
Closes #693.
2017-04-07 17:06:40 +02:00
Matthew Honnibal 1bb7b4ca71 Add comment 2017-03-31 13:59:19 +02:00
Matthew Honnibal 47a3ef06a6 Unhack deprojetivization, moving it into pipeline
Previously the deprojectivize() call was attached to the transition
system, and only called for German. Instead it should be a separate
process, called after the parser. This makes it available for any
language. Closes #898.
2017-03-31 12:31:50 +02:00
Matthew Honnibal a9b1f23c7d Enable regression loss for parser 2017-03-26 09:26:30 -05:00
Matthew Honnibal b487b8735a Decrease beam density, and fix Python 3 problem in beam 2017-03-20 12:56:05 +01:00
Matthew Honnibal c90dc7ac29 Clean up state initiatisation in transition system 2017-03-16 11:59:11 -05:00
Matthew Honnibal a46933a8fe Clean up FTRL parsing stuff. 2017-03-16 11:58:20 -05:00
Matthew Honnibal 2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
Matthew Honnibal 3d0833c3df Fix off-by-1 in parse features fill_context 2017-03-15 19:55:35 -05:00
Matthew Honnibal 4ef68c413f Approximate cost in Break transition, to speed things up a bit. 2017-03-15 16:40:27 -05:00
Matthew Honnibal 8543db8a5b Use ftrl optimizer in parser 2017-03-15 11:56:37 -05:00
Matthew Honnibal d719f8e77e Use nogil in parser, and set L1 to 0.0 by default 2017-03-15 09:31:01 -05:00
Matthew Honnibal c61c501406 Update beam-parser to allow parser to maintain nogil 2017-03-15 09:30:22 -05:00
Matthew Honnibal c79b3129e3 Fix setting of empty lexeme in initial parse state 2017-03-15 09:26:53 -05:00
Matthew Honnibal 6c4108c073 Add header for beam parser 2017-03-11 12:45:12 -06:00
Matthew Honnibal 931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal ca9c8c57c0 Add iteration argument to parser.update 2017-03-11 07:00:47 -06:00
Matthew Honnibal d59c6926c1 I think this fixes the segfault 2017-03-11 06:58:34 -06:00
Matthew Honnibal 318b9e32ff WIP on beam parser. Currently segfaults. 2017-03-11 06:19:52 -06:00
Matthew Honnibal b0d80dc9ae Update name of 'train' function in BeamParser 2017-03-10 14:35:43 -06:00
Matthew Honnibal d11f1a4ddf Record negative costs in non-monotonic arc eager oracle 2017-03-10 11:22:04 -06:00
Matthew Honnibal ecf91a2dbb Support beam parser 2017-03-10 11:21:21 -06:00
Matthew Honnibal c62da02344 Use ftrl training, to learn compressed model. 2017-03-09 18:43:21 -06:00
Matthew Honnibal 40703988bc Use FTRL training in parser 2017-03-08 01:38:51 +01:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matthew Honnibal 97a1286129 Revert changes to tagger and parser for thinc 6 2017-01-09 10:08:34 -06:00
Matthew Honnibal af81ac8bb0 Use thinc 6.0 2016-12-29 11:58:42 +01:00
Matthew Honnibal bc0a202c9c Fix unicode problem in nonproj module 2016-11-25 17:29:17 -06:00
Matthew Honnibal 159e8c46e1 Merge old training fixes with newer state 2016-11-25 09:16:36 -06:00
Matthew Honnibal 39341598bb Fix NER label calculation 2016-11-25 09:02:22 -06:00
Matthew Honnibal ca773a1f53 Tweak arc_eager n_gold to deal with negative costs, and improve error message. 2016-11-25 09:01:52 -06:00
Matthew Honnibal 608d8f5421 Pass cfg through parser, and have is_valid default to 1, not 0 when resetting state 2016-11-25 09:00:21 -06:00
Matthew Honnibal b8c4f5ea76 Allow German noun chunks to work on Span
Update the German noun chunks iterator, so that it also works on Span objects.
2016-11-24 23:30:15 +11:00
Pokey Rule 3e3bda142d Add noun_chunks to Span 2016-11-24 10:47:20 +00:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal 708ea22208 Infer types in transition_system.pyx 2016-10-27 18:08:13 +02:00
Matthew Honnibal 301f3cc898 Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found. 2016-10-27 18:01:55 +02:00
Matthew Honnibal 03a520ec4f Change signature of Parser.parseC, so that nr_class is read from the transition system. This allows the transition system to modify the number of actions in initialize_state. 2016-10-27 17:58:56 +02:00