Commit Graph

622 Commits

Author SHA1 Message Date
Matthew Honnibal e09a08bd00 * Add copy_state function 2015-06-01 23:06:30 +02:00
Matthew Honnibal c7876aa8b6 * Add get_valid method 2015-06-01 23:06:00 +02:00
Matthew Honnibal d82f9d958d * Remove regularization cruft from _ml, move score from .pxd file to .pyx 2015-05-31 18:48:05 +02:00
Matthew Honnibal 5e99ff94c8 * Edits to arc eager oracle. Couldn't figure out how the non-monotonic lines made sense. They seem covered by children_in_stack 2015-05-31 15:14:37 +02:00
Matthew Honnibal 6c5632b71c * Roll back proposed change to Break transition while investigate effect 2015-05-31 06:49:52 +02:00
Matthew Honnibal 6bba793df3 * Disable the Zipf-reweighting thing while investigate effect 2015-05-31 06:48:43 +02:00
Matthew Honnibal e77940565d * Add length cap to distance feature 2015-05-31 05:25:30 +02:00
Matthew Honnibal fd596351ba * Fix valency features 2015-05-31 05:24:33 +02:00
Matthew Honnibal 87d6551d19 * Allow gold parse to cut non-projective arcs 2015-05-31 01:11:56 +02:00
Matthew Honnibal c4f0914b4e * Fix POS tag evaluation in scorer.py: do evaluate punctuation tags 2015-05-30 18:24:32 +02:00
Matthew Honnibal 9e39a206da * Fix efficiency of JSON reading, by using ujson instead of stream 2015-05-30 17:54:52 +02:00
Matthew Honnibal 76300bbb1b * Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag. 2015-05-30 01:25:46 +02:00
Matthew Honnibal b76bbbd12c * Read json files recursively from a directory, instead of requiring a single .json file 2015-05-29 03:52:55 +02:00
Matthew Honnibal 8f31d3b864 * Relax constraint on Break transition for non-monotonic parsing. 2015-05-28 23:39:52 +02:00
Matthew Honnibal 6b2e5c4b8a * Avoid NER scoring for sentences with some missing NER values. 2015-05-28 22:39:08 +02:00
Matthew Honnibal d25d31442d * Hackishly support broken NER annotations. Should fix this. 2015-05-27 19:14:31 +02:00
Matthew Honnibal 7a2725bca4 * Read input json in a streaming way 2015-05-27 19:13:11 +02:00
Matthew Honnibal 6a1c91675e * Add file to read ENAMEX ner data 2015-05-27 17:36:23 +02:00
Matthew Honnibal 732fa7709a * Edits to align_raw script, for use in prepare_treebank 2015-05-27 04:23:31 +02:00
Matthew Honnibal 4010b9b6d9 * Pass parameter for regularization in parser.pyx 2015-05-27 03:18:50 +02:00
Matthew Honnibal 4c6058baa7 * Fix evaluation of NER in scorer.py 2015-05-27 03:18:16 +02:00
Matthew Honnibal 6016ee83a6 * Fix reading of NER in gold.pyx 2015-05-27 03:17:50 +02:00
Matthew Honnibal 04bda8648d * Pass parameter for regularization to model 2015-05-27 03:16:58 +02:00
Matthew Honnibal f69fe6a635 * Fix heads problem in read_conll 2015-05-27 01:14:54 +02:00
Matthew Honnibal 0eec1d12af * Add comment about zipf reweighting 2015-05-27 01:14:07 +02:00
Matthew Honnibal 4d37b66c55 * Make Zipf regularization a bit more efficient 2015-05-27 01:12:50 +02:00
Matthew Honnibal 7fc24821bc * Experiment with Zipfian corruptions when calculating prediction 2015-05-26 22:17:15 +02:00
Matthew Honnibal eba7b34f66 * Add flag to disable loading of word vectors 2015-05-25 01:02:42 +02:00
Matthew Honnibal 3593babd35 * Add functions for Levenshtein distance alignment 2015-05-24 21:50:48 +02:00
Matthew Honnibal 744f06abf5 * Add script to read OntoNotes source documents 2015-05-24 21:49:58 +02:00
Matthew Honnibal fc75210941 * Move spacy.syntax.conll to spacy.gold 2015-05-24 21:35:02 +02:00
Matthew Honnibal 765b61cac4 * Update spacy.scorer, to use P/R/F to support tokenization errors 2015-05-24 20:07:18 +02:00
Matthew Honnibal efe7a7d7d6 * Clean unused functions from spacy.syntax.conll 2015-05-24 20:06:46 +02:00
Matthew Honnibal 78487f3e66 * Update parser oracle for missing heads 2015-05-24 20:05:58 +02:00
Matthew Honnibal 1044a13413 * Begin refactoring scorer to use recall over gold dependencies 2015-05-24 17:40:15 +02:00
Matthew Honnibal acd1245ad4 * Remove cruft from conll.pyx --- unused stuff about evlauation, which now lives in spacy.scorer 2015-05-24 17:35:49 +02:00
Matthew Honnibal 20f1d868a3 * Tmp commit. Working on whole document parsing 2015-05-24 02:49:56 +02:00
Matthew Honnibal f2ee9c4feb * Comment out constituency parsing stuff, so that code compiles 2015-05-20 16:55:05 +02:00
Matthew Honnibal 8ee7c541f1 * Update Constituent definition 2015-05-20 16:03:26 +02:00
Matthew Honnibal 9dfc9c039c * Work on constituency parsing. 2015-05-20 16:02:51 +02:00
Matthew Honnibal ba07b925a7 * Fix compile error in conll.pyx 2015-05-12 22:33:47 +02:00
Matthew Honnibal f1e0272b18 * Disable c-parsing transitions 2015-05-12 22:33:25 +02:00
Matthew Honnibal 03a6626545 * Tmp commit 2015-05-12 20:27:56 +02:00
Matthew Honnibal 9568ebed08 * Fix off-by-one in head reading 2015-05-12 20:27:56 +02:00
Matthew Honnibal 69840d8cc3 * Tweak verbose output printing in scorer.py 2015-05-12 20:27:56 +02:00
Matthew Honnibal 0605af6838 * Fix head misalignment in read_conll, when periods are ignored 2015-05-12 20:27:56 +02:00
Matthew Honnibal d2ac8d8007 * Add ctnt field to State, in preparation for constituency parsing 2015-05-12 20:27:56 +02:00
Matthew Honnibal ab67693393 * Add read_json_file to conll.pyx 2015-05-12 20:27:55 +02:00
Matthew Honnibal aff9359a8d * Update ner.pyx to expect brackets from gold_tuples 2015-05-12 20:27:55 +02:00
Matthew Honnibal 0ad72a77ce * Write JSON files, with both dependency and PSG parses 2015-05-12 20:27:55 +02:00