Commit Graph

343 Commits

Author SHA1 Message Date
Matthew Honnibal a510d9f677 * Another assertion removed 2015-01-05 13:01:40 +11:00
Matthew Honnibal 2856946a66 * Remove assertion that doesn't work on Python 3 2015-01-05 12:51:16 +11:00
Matthew Honnibal 94034f1112 * Fix encoding in lemmatization 2015-01-05 11:54:29 +11:00
Matthew Honnibal b132b3caa6 * Fix unicode error in lemmatizer 2015-01-05 11:53:54 +11:00
Matthew Honnibal 477e7fbffe * Fix data reading for lemmatizer 2015-01-05 06:01:32 +11:00
Matthew Honnibal 58f75abaca * Fix unicode error in orth 2015-01-05 05:53:08 +11:00
Matthew Honnibal 4e085d5166 * Fix lemmatizer for Python3 2015-01-05 05:51:26 +11:00
Matthew Honnibal ae7c811fd1 * Use Exception instead of StandardError 2015-01-04 01:22:12 +11:00
Matthew Honnibal 0e4c2ba036 * Fix loading of special morph words 2015-01-03 23:13:00 +11:00
Matthew Honnibal f5d41028b5 * Move around data files for test release 2015-01-03 01:59:22 +11:00
Matthew Honnibal a24321b63a * Add downloader 2015-01-02 21:44:41 +11:00
Matthew Honnibal 5d9a096e2f * Some minor clean-up after HastyModel 2014-12-31 19:46:04 +11:00
Matthew Honnibal aafaf58cbe * Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile. 2014-12-31 19:40:59 +11:00
Matthew Honnibal bcd038e7b6 * Implement HastyModel 2014-12-31 01:16:47 +11:00
Matthew Honnibal 1a075f77ff * Don't over-ride pre-loaded POS tags, if set by special-cases 2014-12-30 23:26:32 +11:00
Matthew Honnibal 785c7ba76a * Embed signature on attrs 2014-12-30 23:25:31 +11:00
Matthew Honnibal 30e5805656 * Lazy-load tagger and parser 2014-12-30 23:25:09 +11:00
Matthew Honnibal 9976aa976e * Messily fix morphology and POS tags on special tokens. 2014-12-30 23:24:37 +11:00
Matthew Honnibal c1ef3febee * Embedsignature in tokens.pyx 2014-12-30 21:22:00 +11:00
Matthew Honnibal aac5028b6e * Move tagger to _ml 2014-12-30 21:21:38 +11:00
Matthew Honnibal 1ffb0229ed * Import tokens in parser.pxd 2014-12-30 21:21:17 +11:00
Matthew Honnibal bb0b00f819 * Repurporse the Tagger class as a generic Model, wrapping thinc's interface 2014-12-30 21:20:15 +11:00
Matthew Honnibal fe2a5e0370 * Work on docstrings 2014-12-27 21:46:04 +11:00
Matthew Honnibal bb80937544 * Upd docstrings 2014-12-27 18:45:16 +11:00
Matthew Honnibal b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal ab61673edd * Fix api of array method 2014-12-23 15:18:48 +11:00
Matthew Honnibal 7708d0e24a * Move lemmatizer to en dir 2014-12-23 15:16:57 +11:00
Matthew Honnibal 98eb4c0426 * Fix path to parser model 2014-12-23 15:09:09 +11:00
Matthew Honnibal b00bc01d8c * All tests now passing for reorg 2014-12-23 13:18:59 +11:00
Matthew Honnibal 73f200436f * Tests passing except for morphology/lemmatization stuff 2014-12-23 11:40:32 +11:00
Matthew Honnibal cf8d26c3d2 * POS tagger training working after reorg 2014-12-22 08:54:47 +11:00
Matthew Honnibal 4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal 61df50b598 * Add English-subclass POS tagger 2014-12-21 20:59:07 +11:00
Matthew Honnibal 9f3f07cab6 * Add attrs file for English 2014-12-21 11:29:11 +11:00
Matthew Honnibal 2a89d70429 * Add vocab.pyx to setup, and ensure we can import spacy.en.lang 2014-12-21 06:03:53 +11:00
Matthew Honnibal b34a1325d3 * Everything compiling after reorg. About to start testing. 2014-12-21 05:42:23 +11:00
Matthew Honnibal e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal d11c1edf8c * Import slice_unicode from strings.pyx 2014-12-20 07:56:26 +11:00
Matthew Honnibal be1bdcbd85 * Move lang.pyx to tokenizer.pyx 2014-12-20 07:55:40 +11:00
Matthew Honnibal 89a1cc1a48 * Move murmurhash to .pxd in strings file 2014-12-20 07:41:08 +11:00
Matthew Honnibal d5a942c4a4 * Rename lang.pyx to tokenizer.pyx 2014-12-20 07:30:39 +11:00
Matthew Honnibal a60ae261ae * Move tokenizer to its own file, and refactor 2014-12-20 07:29:16 +11:00
Matthew Honnibal 867a4a000c * Export set_morph_from_dict function 2014-12-20 07:28:27 +11:00
Matthew Honnibal 4e30195c6d * Refactor morphology.pyx 2014-12-20 07:27:28 +11:00
Matthew Honnibal 4c6ce7ee84 * Update tokens.pyx as part of reorg 2014-12-20 07:03:26 +11:00
Matthew Honnibal 116f7f3bc1 * Rename Lexicon to Vocab, and move it to its own file 2014-12-20 06:54:03 +11:00
Matthew Honnibal 780cbd68b1 * Move all struct definitions to structs.pxd, to avoid circular dependencies 2014-12-20 06:51:33 +11:00
Matthew Honnibal f6556d8e5d * Refactor, move Lexeme struct to structs.pxd 2014-12-20 06:51:03 +11:00
Matthew Honnibal 7d48bba6c4 * Move StringStore class to its own file 2014-12-20 06:42:01 +11:00
Matthew Honnibal b066102d2d * Remove POS cache for now 2014-12-20 03:49:58 +11:00