Commit Graph

437 Commits

Author SHA1 Message Date
Matthew Honnibal 2a89d70429 * Add vocab.pyx to setup, and ensure we can import spacy.en.lang 2014-12-21 06:03:53 +11:00
Matthew Honnibal b34a1325d3 * Everything compiling after reorg. About to start testing. 2014-12-21 05:42:23 +11:00
Matthew Honnibal e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal d11c1edf8c * Import slice_unicode from strings.pyx 2014-12-20 07:56:26 +11:00
Matthew Honnibal be1bdcbd85 * Move lang.pyx to tokenizer.pyx 2014-12-20 07:55:40 +11:00
Matthew Honnibal 89a1cc1a48 * Move murmurhash to .pxd in strings file 2014-12-20 07:41:08 +11:00
Matthew Honnibal d5a942c4a4 * Rename lang.pyx to tokenizer.pyx 2014-12-20 07:30:39 +11:00
Matthew Honnibal a60ae261ae * Move tokenizer to its own file, and refactor 2014-12-20 07:29:16 +11:00
Matthew Honnibal 867a4a000c * Export set_morph_from_dict function 2014-12-20 07:28:27 +11:00
Matthew Honnibal 4e30195c6d * Refactor morphology.pyx 2014-12-20 07:27:28 +11:00
Matthew Honnibal 4c6ce7ee84 * Update tokens.pyx as part of reorg 2014-12-20 07:03:26 +11:00
Matthew Honnibal 116f7f3bc1 * Rename Lexicon to Vocab, and move it to its own file 2014-12-20 06:54:03 +11:00
Matthew Honnibal 780cbd68b1 * Move all struct definitions to structs.pxd, to avoid circular dependencies 2014-12-20 06:51:33 +11:00
Matthew Honnibal f6556d8e5d * Refactor, move Lexeme struct to structs.pxd 2014-12-20 06:51:03 +11:00
Matthew Honnibal 7d48bba6c4 * Move StringStore class to its own file 2014-12-20 06:42:01 +11:00
Matthew Honnibal e15b9da7db * Pin preshed to a particular version 2014-12-20 04:01:32 +11:00
Matthew Honnibal ed2fff6128 * Add tests 2014-12-20 03:51:25 +11:00
Matthew Honnibal b066102d2d * Remove POS cache for now 2014-12-20 03:49:58 +11:00
Matthew Honnibal ff252dd535 * Clean up 'guess_cache' idea, which didnt work well enough 2014-12-20 03:49:11 +11:00
Matthew Honnibal 9d3ca13909 * Start work on parse-tree iteration classes 2014-12-20 03:48:10 +11:00
Matthew Honnibal bed680c632 * Remove commented-out features 2014-12-20 03:47:32 +11:00
Matthew Honnibal 3d178c03ae * Prune the features a bit 2014-12-20 02:46:14 +11:00
Matthew Honnibal a0408e1758 * Working DecisionMemory class 2014-12-20 01:43:26 +11:00
Matthew Honnibal 7920ea72b4 * Working parser with the decision memory idea. Disabling that for now, for simplicity 2014-12-20 01:43:15 +11:00
Matthew Honnibal a2f2a48da9 * Add some extra features 2014-12-20 01:42:24 +11:00
Matthew Honnibal 8fd9762d91 * Start laying out parse tree iteration methods 2014-12-20 01:42:09 +11:00
Matthew Honnibal 53b8bc1f3c * Work on implementing a trainable cache for the parser. So far, doesn't improve efficiency 2014-12-19 09:30:50 +11:00
Matthew Honnibal 033d6c9ac2 * Adapt POS tagger decision-memory for use in parser 2014-12-19 07:23:04 +11:00
Matthew Honnibal 809ddf7887 * Add index.pxd 2014-12-19 07:23:00 +11:00
Matthew Honnibal 1879abd16a * Set const-correctness for tagger 2014-12-18 20:41:52 +11:00
Matthew Honnibal f72243b156 * Set const-correctness for Feature* array 2014-12-18 20:41:32 +11:00
Matthew Honnibal 6ab7e40590 * Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set 2014-12-18 11:33:25 +11:00
Matthew Honnibal 7e0c692daf * Automatically push when the stack is empty 2014-12-18 09:16:10 +11:00
Matthew Honnibal 61142a8eff * Tweak features 2014-12-18 09:15:03 +11:00
Matthew Honnibal e3b123e6e0 * Ignore cpp files from parser 2014-12-18 09:05:51 +11:00
Matthew Honnibal 8446ebfbbb * Work on parser. Up to 92 UAS on YM labels 2014-12-18 09:05:31 +11:00
Matthew Honnibal 55de747bfc * Remove .cpp files 2014-12-18 02:43:13 +11:00
Matthew Honnibal 4448a840f7 * Work on greedy parsing. Scoring about 91.2 2014-12-18 02:42:55 +11:00
Matthew Honnibal 87e9487d76 * Work on parser 2014-12-17 21:10:12 +11:00
Matthew Honnibal 9d7d97978d * Work on greedy parser 2014-12-17 21:09:29 +11:00
Matthew Honnibal d524dd306a * Work on greedy parser 2014-12-17 03:19:43 +11:00
Matthew Honnibal 95ccea03b2 * Work on greedy parser 2014-12-16 22:46:55 +11:00
Matthew Honnibal a432862fde * Add exception type to _arg_max_among in tagger 2014-12-16 09:44:19 +11:00
Matthew Honnibal 9e00798820 * Work on integrating a greedy dependency parser 2014-12-16 08:06:04 +11:00
Matthew Honnibal 24ffc32f2f * Another redraft of index.rst 2014-12-15 16:32:03 +11:00
Matthew Honnibal 77dd7a212a * More thoughts on intro 2014-12-15 09:19:29 +11:00
Matthew Honnibal 792802b2b9 * POS tag memoisation working, with good speed-up 2014-12-12 14:33:51 +11:00
Matthew Honnibal ca54d58638 * Merge setup.py 2014-12-10 15:21:27 +11:00
Matthew Honnibal 9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. 2014-12-10 08:09:32 +11:00
Matthew Honnibal 7831b06610 * Compile morphology.pyx file 2014-12-10 08:09:13 +11:00