Commit Graph

37 Commits

Author SHA1 Message Date
Matthew Honnibal 2a89d70429 * Add vocab.pyx to setup, and ensure we can import spacy.en.lang 2014-12-21 06:03:53 +11:00
Matthew Honnibal 87e9487d76 * Work on parser 2014-12-17 21:10:12 +11:00
Matthew Honnibal 7831b06610 * Compile morphology.pyx file 2014-12-10 08:09:13 +11:00
Matthew Honnibal ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules 2014-12-07 23:52:41 +11:00
Matthew Honnibal 91e8d9ea1c * Compile context.pyx and tagger.pyx modules 2014-12-07 15:29:54 +11:00
Matthew Honnibal a14f9eaf63 * Add index.pyx to setup 2014-12-04 22:14:11 +11:00
Matthew Honnibal d0d812c548 * Hack setup.py to exclude tagger stuff 2014-12-03 11:06:57 +11:00
Matthew Honnibal b934bf1c69 * Compile IOB 2014-11-12 23:21:40 +11:00
Matthew Honnibal d5e9dce039 * Compile ner NER code 2014-11-11 21:10:22 +11:00
Matthew Honnibal dbbb914480 * Upd setup 2014-11-05 20:45:44 +11:00
Matthew Honnibal 67c8c8019f * Update lexeme serialization, using a binary file format 2014-10-30 01:01:00 +11:00
Matthew Honnibal 5ebe14f353 * Add greedy pos tagger 2014-10-22 10:17:26 +11:00
Matthew Honnibal aba4a7c7ea * Remove ptb3 file from setup 2014-09-25 18:41:25 +02:00
Matthew Honnibal b15619e170 * Use PointerHash instead of locally provided _hashing module 2014-09-25 18:23:35 +02:00
Matthew Honnibal ac522e2553 * Switch from own memory class to cymem, in pip 2014-09-17 23:09:24 +02:00
Matthew Honnibal 5a20dfc03e * Add memory management code 2014-09-17 20:02:06 +02:00
Matthew Honnibal 0447279c57 * PointerHash working, efficiency is good. 6-7 mins 2014-09-13 16:43:59 +02:00
Matthew Honnibal b488224c09 * Restoring Lexeme-as-struct 2014-09-10 20:41:37 +02:00
Matthew Honnibal e80d3b9784 * Compile tokens in setup 2014-09-10 19:41:19 +02:00
Matthew Honnibal 7dac9b9ccb * Fix setup script 2014-09-01 23:41:59 +02:00
Matthew Honnibal 68bae2fec6 * More refactoring 2014-08-25 16:42:22 +02:00
Matthew Honnibal 3b793cf4f7 * Tests passing for new Word object version 2014-08-24 18:13:53 +02:00
Matthew Honnibal 89d6faa9c9 * Move en_ptb to ptb3 2014-08-22 04:24:05 +02:00
Matthew Honnibal d42cdbb446 * Compile orthography.latin.pyx 2014-08-20 17:03:19 +02:00
Matthew Honnibal 01469b0888 * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. 2014-08-18 19:14:00 +02:00
Matthew Honnibal 865cacfaf7 * Remove dependence on murmurhash 2014-08-16 17:37:09 +02:00
Matthew Honnibal 7fd9b2f1f8 * Add murmurhash to setup while we figure out cython includes 2014-08-15 23:56:57 +02:00
Matthew Honnibal 365a2af756 * Restore happax. commit uncommited work 2014-08-02 21:27:03 +01:00
Matthew Honnibal 18fb76b2c4 * Removed happax. Not sure if good idea. 2014-08-02 20:53:35 +01:00
Matthew Honnibal d4b8bc07ce * Use FixedTable to control index size 2014-08-01 07:27:48 +01:00
Matthew Honnibal a235804730 * Fix setup.py 2014-07-31 02:03:53 +01:00
Matthew Honnibal 5461399924 * Fix setup.py 2014-07-31 02:03:10 +01:00
Matthew Honnibal b9016c4633 * Switch to using sparsehash and murmurhash libraries out of pip 2014-07-25 15:47:27 +01:00
Matthew Honnibal 1c5ab3b49a * Add tokens module to setup 2014-07-07 12:51:23 +02:00
Matthew Honnibal 648d1fe3ed * Compile en_ptb 2014-07-07 05:10:28 +02:00
Matthew Honnibal 0c1be7effe * Compile string_tools module 2014-07-07 04:24:00 +02:00
Matthew Honnibal ca7045f3f2 * Add build/setup stuff 2014-07-05 20:49:34 +02:00