Commit Graph

490 Commits

Author SHA1 Message Date
Matthew Honnibal f21ab2d7fb * Fix bug in ugly ent_strings hack on English class 2015-03-26 16:44:45 +01:00
Matthew Honnibal 1c843934be * Fix oracle bug in NER. Now getting 77% F on ontonotes 2015-03-26 16:44:44 +01:00
Matthew Honnibal 903f196b3f * Fix verbose printing for scorer 2015-03-26 16:44:44 +01:00
Matthew Honnibal e181c051d5 * Improve features for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal 7ecb52c0ed * Add scorer script 2015-03-26 16:44:44 +01:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal ae235e07b9 * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. 2015-03-26 16:44:44 +01:00
Matthew Honnibal b3eda03c9c * Tmp 2015-03-26 16:44:44 +01:00
Matthew Honnibal 220ce8bfed * Prepare English class for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal f5830dc1c1 * Remove _transitions.pyx 2015-03-26 16:44:44 +01:00
Matthew Honnibal 6865c2fb4d * Fix assignment of dep strings in tokens.pyx 2015-03-26 16:44:43 +01:00
Matthew Honnibal 6b6bce9e7a * Fix label loading for transition system 2015-03-26 16:44:43 +01:00
Matthew Honnibal 5278c7504b * Hacks to conll.pyx. Should clean these up. 2015-03-26 16:44:43 +01:00
Matthew Honnibal f321b2b2eb * Remove TODO comment 2015-03-26 16:44:43 +01:00
Matthew Honnibal fdabd93bfb * Ensure high loss for invalid moves, and fix label reading for arc-eager 2015-03-26 16:44:43 +01:00
Matthew Honnibal 10ed738df2 * Tmp commit 2015-03-26 16:44:43 +01:00
Matthew Honnibal 4f83c9b3d5 * Make costs label-sensitive 2015-03-26 16:44:43 +01:00
Matthew Honnibal 179b7eb0a7 * Specify parser transition system in language 2015-03-26 16:44:43 +01:00
Matthew Honnibal 8c883cef58 * Refactored transition system code now compiling. Still need to hook up label oracle, and test 2015-03-26 16:44:43 +01:00
Matthew Honnibal f0159ab4b6 * Add file to hold GoldParse class 2015-03-26 16:44:42 +01:00
Matthew Honnibal 8eadb984cb * Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle 2015-03-26 16:44:42 +01:00
Matthew Honnibal b063001596 * Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid 2015-03-26 16:44:42 +01:00
Matthew Honnibal 01bc4d6815 * Add set_parse method, to assign parse to tokens in a less hacky way. 2015-03-26 16:44:42 +01:00
Matthew Honnibal dc986dbc0b * Work on refactored parser, where TransitionSystem can be easily subclassed 2015-03-26 16:44:42 +01:00
Matthew Honnibal 1cc6329b18 * Add base class to do transitions 2015-03-26 16:44:42 +01:00
Matthew Honnibal 135756ac3d * Tmp commit of NER refactoring 2015-03-26 16:44:42 +01:00
Matthew Honnibal 23c1f6fc04 * Merge changes from stash 2015-03-26 16:44:41 +01:00
Matthew Honnibal 0ff078876a * Commit some work on ner.yx done on the plane 2015-03-26 16:44:41 +01:00
Matthew Honnibal d81b7be6a2 * Merge train.py 2015-03-26 16:44:41 +01:00
Matthew Honnibal 2e3dc3dfe2 * Merge changes in tokens.pyx 2015-03-26 16:44:41 +01:00
Matthew Honnibal 8cc3524dc9 * Ws 2015-03-26 16:44:41 +01:00
Matthew Honnibal 3d0570685c * Add NER transition system 2015-03-26 16:44:41 +01:00
Matthew Honnibal 043b758cf4 * Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference. 2015-03-26 16:44:41 +01:00
Matthew Honnibal b139aa92ba * Start setting out how NER will be implemented in the data model 2015-03-26 16:44:41 +01:00
Matthew Honnibal 0962ffc095 * Fix issue #37: missing check_flag attribute from Token class 2015-03-26 15:06:26 +01:00
Matthew Honnibal 2e8d0e5d45 * Upd download script 2015-03-03 05:47:16 -05:00
Matthew Honnibal dbe26f5793 * Add children and subtree methods to Token, which are generators to assist parse-tree navigation. 2015-03-03 04:18:41 -05:00
Matthew Honnibal ea90d136e8 * Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy. 2015-02-27 03:56:10 -05:00
Matthew Honnibal caf046b220 * Hastily add method to apply tags from a list of strings, instead of predicting the tags. 2015-02-23 15:40:17 -05:00
Matthew Honnibal cae077b583 * Work on fixing orphaned Token objects bug 2015-02-16 15:20:31 -05:00
Matthew Honnibal 7572e31f5e * Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive. 2015-02-11 18:05:06 -05:00
Matthew Honnibal 64645a1c2f * Improve docstring on English 2015-02-11 15:13:20 -05:00
Matthew Honnibal 594e50bd45 * Add option to download speech-parsing data set. 2015-02-11 14:20:29 -05:00
Matthew Honnibal 0b7e769211 * Add POS tags to support SWBD tag set 2015-02-11 14:08:28 -05:00
Matthew Honnibal 312b3a45f3 * Fix issue #19: Allow parsing/pos tagging of empty strings 2015-02-10 10:15:58 -05:00
Matthew Honnibal 2a0615104b * Upd download script 2015-02-09 10:22:59 -05:00
Matthew Honnibal 5c3513583d * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. 2015-02-09 03:57:10 -05:00
Matthew Honnibal be5536d239 * Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON. 2015-02-08 18:36:18 -05:00
Matthew Honnibal 0492cee8b4 * Fix Issue #24: Lemmas are empty when the L field is missing for special-cased tokens 2015-02-08 18:30:30 -05:00
Matthew Honnibal d229fbd228 * Give better error on out-of-bounds array access 2015-02-07 12:59:12 -05:00