Commit Graph

437 Commits

Author SHA1 Message Date
Matthew Honnibal c90dc7ac29 Clean up state initiatisation in transition system 2017-03-16 11:59:11 -05:00
Matthew Honnibal a46933a8fe Clean up FTRL parsing stuff. 2017-03-16 11:58:20 -05:00
Matthew Honnibal 2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
Matthew Honnibal 3d0833c3df Fix off-by-1 in parse features fill_context 2017-03-15 19:55:35 -05:00
Matthew Honnibal 4ef68c413f Approximate cost in Break transition, to speed things up a bit. 2017-03-15 16:40:27 -05:00
Matthew Honnibal 8543db8a5b Use ftrl optimizer in parser 2017-03-15 11:56:37 -05:00
Matthew Honnibal d719f8e77e Use nogil in parser, and set L1 to 0.0 by default 2017-03-15 09:31:01 -05:00
Matthew Honnibal c61c501406 Update beam-parser to allow parser to maintain nogil 2017-03-15 09:30:22 -05:00
Matthew Honnibal c79b3129e3 Fix setting of empty lexeme in initial parse state 2017-03-15 09:26:53 -05:00
Matthew Honnibal 6c4108c073 Add header for beam parser 2017-03-11 12:45:12 -06:00
Matthew Honnibal 931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal ca9c8c57c0 Add iteration argument to parser.update 2017-03-11 07:00:47 -06:00
Matthew Honnibal d59c6926c1 I think this fixes the segfault 2017-03-11 06:58:34 -06:00
Matthew Honnibal 318b9e32ff WIP on beam parser. Currently segfaults. 2017-03-11 06:19:52 -06:00
Matthew Honnibal b0d80dc9ae Update name of 'train' function in BeamParser 2017-03-10 14:35:43 -06:00
Matthew Honnibal d11f1a4ddf Record negative costs in non-monotonic arc eager oracle 2017-03-10 11:22:04 -06:00
Matthew Honnibal ecf91a2dbb Support beam parser 2017-03-10 11:21:21 -06:00
Matthew Honnibal c62da02344 Use ftrl training, to learn compressed model. 2017-03-09 18:43:21 -06:00
Matthew Honnibal 40703988bc Use FTRL training in parser 2017-03-08 01:38:51 +01:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matthew Honnibal 97a1286129 Revert changes to tagger and parser for thinc 6 2017-01-09 10:08:34 -06:00
Matthew Honnibal af81ac8bb0 Use thinc 6.0 2016-12-29 11:58:42 +01:00
Matthew Honnibal bc0a202c9c Fix unicode problem in nonproj module 2016-11-25 17:29:17 -06:00
Matthew Honnibal 159e8c46e1 Merge old training fixes with newer state 2016-11-25 09:16:36 -06:00
Matthew Honnibal 39341598bb Fix NER label calculation 2016-11-25 09:02:22 -06:00
Matthew Honnibal ca773a1f53 Tweak arc_eager n_gold to deal with negative costs, and improve error message. 2016-11-25 09:01:52 -06:00
Matthew Honnibal 608d8f5421 Pass cfg through parser, and have is_valid default to 1, not 0 when resetting state 2016-11-25 09:00:21 -06:00
Matthew Honnibal b8c4f5ea76 Allow German noun chunks to work on Span
Update the German noun chunks iterator, so that it also works on Span objects.
2016-11-24 23:30:15 +11:00
Pokey Rule 3e3bda142d Add noun_chunks to Span 2016-11-24 10:47:20 +00:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal 708ea22208 Infer types in transition_system.pyx 2016-10-27 18:08:13 +02:00
Matthew Honnibal 301f3cc898 Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found. 2016-10-27 18:01:55 +02:00
Matthew Honnibal 03a520ec4f Change signature of Parser.parseC, so that nr_class is read from the transition system. This allows the transition system to modify the number of actions in initialize_state. 2016-10-27 17:58:56 +02:00
Matthew Honnibal a209b10579 Improve error message when oracle fails for non-projective trees, re Issue #571. 2016-10-24 20:31:30 +02:00
Matthew Honnibal 3e688e6d4b Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness. 2016-10-23 17:45:44 +02:00
Matthew Honnibal 59038f7efa Restore support for prior data format -- specifically, the labels field of the config. 2016-10-17 00:53:26 +02:00
Matthew Honnibal 7887ab3b36 Fix default use of feature_templates in parser 2016-10-16 21:41:56 +02:00
Matthew Honnibal f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00
Matthew Honnibal 274a4d4272 Fix queue Python property in StateClass 2016-10-16 17:04:41 +02:00
Matthew Honnibal e8c8aa08ce Make action_name optional in StepwiseState 2016-10-16 17:04:16 +02:00
Matthew Honnibal 4fc56d4a31 Rename 'labels' to 'actions' in parser options 2016-10-16 11:42:26 +02:00
Matthew Honnibal 3259a63779 Whitespace 2016-10-16 01:47:28 +02:00
Matthew Honnibal d9ae2d68af Load features by string-name for backwards compatibility. 2016-10-12 20:15:11 +02:00
Matthew Honnibal 3a03c668c3 Fix message in ParserStateError 2016-10-12 14:44:31 +02:00
Matthew Honnibal 6bf505e865 Fix error on ParserStateError 2016-10-12 14:35:55 +02:00
Matthew Honnibal ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal 1d70db58aa Revert "Changes to iterators.pyx for new StringStore scheme"
This reverts commit 4f794b215a.
2016-09-30 20:19:53 +02:00
Matthew Honnibal 9e09b39b9f Revert "Changes to transition systems for new StringStore scheme"
This reverts commit 0442e0ab1e.
2016-09-30 20:11:49 +02:00
Matthew Honnibal e3285f6f30 Revert "Fix report of ParserStateError"
This reverts commit 78f19baafa.
2016-09-30 20:11:33 +02:00
Matthew Honnibal 78f19baafa Fix report of ParserStateError 2016-09-30 19:59:22 +02:00