Commit Graph

110 Commits

Author SHA1 Message Date
Matthew Honnibal 9568ebed08 * Fix off-by-one in head reading 2015-05-12 20:27:56 +02:00
Matthew Honnibal d2ac8d8007 * Add ctnt field to State, in preparation for constituency parsing 2015-05-12 20:27:56 +02:00
Matthew Honnibal ab67693393 * Add read_json_file to conll.pyx 2015-05-12 20:27:55 +02:00
Matthew Honnibal aff9359a8d * Update ner.pyx to expect brackets from gold_tuples 2015-05-12 20:27:55 +02:00
Matthew Honnibal 53cf77e1c8 * Bug fix: when non-monotonically correct a dependency, make sure to delete the old one from the child list 2015-05-12 20:26:41 +02:00
Matthew Honnibal a4e2af54f9 * Add support for l/r edge to add_dep, and move inlined methods into _state.pyx where possible 2015-05-12 20:26:41 +02:00
Matthew Honnibal fb8d50b3d5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-04-30 12:45:15 +02:00
Matthew Honnibal ed8e8c3bd0 * Whitespace 2015-04-29 14:22:47 +02:00
Matthew Honnibal 763ef01575 * Fix two bugs in feature calculation 2015-04-28 23:25:09 +02:00
Matthew Honnibal b3fd48c97b * Fix missing root labels bug identified in Issue #57 2015-04-28 20:45:51 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 99dbf8a38c * Fix error type in lookup_transition 2015-04-16 01:36:22 +02:00
Matthew Honnibal 9f16848b60 * Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend' 2015-04-15 06:01:18 +02:00
Matthew Honnibal 507048dc45 * Rename StandardError to Exception, for Python 3 compatibility 2015-04-12 07:28:34 +02:00
Matthew Honnibal 1d05e6da00 * Add ne_iob and ne_type features to NER 2015-04-10 19:07:08 +02:00
Matthew Honnibal 4df8a3d90f * Add ne_iob and ne_type attributes to context vector 2015-04-10 05:02:15 +02:00
Matthew Honnibal 8c354c432b * Add ValueError condition to ner_tag reading 2015-04-10 04:59:59 +02:00
Matthew Honnibal 435cccf098 * Add read_conll03_file function to conll.pyx 2015-04-10 04:59:11 +02:00
Matthew Honnibal 99c9ecfc18 * Fix bug in prefix, suffix and word shape features in parser and NER 2015-04-10 03:53:33 +02:00
Matthew Honnibal 5a075ea3fc * Ensure NER moves are available for single-word tokens 2015-04-05 22:30:58 +02:00
Matthew Honnibal a60a366b2c * Support 'punct' dep label in conll.pyx 2015-04-05 22:30:19 +02:00
Matthew Honnibal a3af6b7c3d * Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty. 2015-03-27 17:39:16 +01:00
Matthew Honnibal db5a43318c * Improve print_state debug printer 2015-03-27 17:29:58 +01:00
Matthew Honnibal 1705eccbbe * Remove whitespace 2015-03-27 15:22:39 +01:00
Matthew Honnibal 3feb52374c * Break apart a condition, for ease of debug printing 2015-03-27 15:21:38 +01:00
Matthew Honnibal b32f581acb * Fix bug in ArcEager.get_labels 2015-03-27 15:21:06 +01:00
Matthew Honnibal 1320bd19db * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal e854ba0a13 * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter 2015-03-26 16:44:47 +01:00
Matthew Honnibal 6a6085f8b9 * Clean up GreedyParser.train function a bit 2015-03-26 16:44:47 +01:00
Matthew Honnibal b3157927e6 * Clean up unused feature templates 2015-03-26 16:44:47 +01:00
Matthew Honnibal 411bf377d4 * Remove dependency on ner_util module 2015-03-26 16:44:47 +01:00
Matthew Honnibal 01c892f583 * Add comment to fill_context 2015-03-26 16:44:47 +01:00
Matthew Honnibal 2741179aff * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. 2015-03-26 16:44:47 +01:00
Matthew Honnibal 71648205d9 * Add support for debug feature set. Just use unigrams for this. 2015-03-26 16:44:47 +01:00
Matthew Honnibal 3b70b304b2 * Add words to gold_tuples from gold conll file 2015-03-26 16:44:47 +01:00
Matthew Honnibal 05d6065e2e * Add assertion 2015-03-26 16:44:46 +01:00
Matthew Honnibal 377e9b29b1 * Whitespace 2015-03-26 16:44:46 +01:00
Matthew Honnibal 9f4ad8fdfb * Assign root words the ROOT label via the Break transition. Something is still wrong here... 2015-03-26 16:44:46 +01:00
Matthew Honnibal f729164c01 * Fix bug in label assignment: ensure null-label transitions receive the label 0 2015-03-26 16:44:46 +01:00
Matthew Honnibal 31fad99518 * Use StringStore to encode label names, instead of label_ids 2015-03-26 16:44:45 +01:00
Matthew Honnibal b9b695fb1b * Remove debug word list 2015-03-26 16:44:45 +01:00
Matthew Honnibal 1c843934be * Fix oracle bug in NER. Now getting 77% F on ontonotes 2015-03-26 16:44:44 +01:00
Matthew Honnibal e181c051d5 * Improve features for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal ae235e07b9 * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. 2015-03-26 16:44:44 +01:00
Matthew Honnibal b3eda03c9c * Tmp 2015-03-26 16:44:44 +01:00
Matthew Honnibal 6b6bce9e7a * Fix label loading for transition system 2015-03-26 16:44:43 +01:00
Matthew Honnibal 5278c7504b * Hacks to conll.pyx. Should clean these up. 2015-03-26 16:44:43 +01:00
Matthew Honnibal f321b2b2eb * Remove TODO comment 2015-03-26 16:44:43 +01:00
Matthew Honnibal fdabd93bfb * Ensure high loss for invalid moves, and fix label reading for arc-eager 2015-03-26 16:44:43 +01:00