Commit Graph

16 Commits

Author SHA1 Message Date
Matthew Honnibal fd525f0675 * Pass OOV probability around 2015-07-25 23:29:51 +02:00
Matthew Honnibal a7c4d72e83 * Add serializer property to Vocab, and lazy-load it. Add get_by_orth method. 2015-07-23 01:18:19 +02:00
Matthew Honnibal 109106a949 * Replace UniStr, using unicode objects instead 2015-07-22 04:52:05 +02:00
Matthew Honnibal 317cbbc015 * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. 2015-07-19 15:18:17 +02:00
Matthew Honnibal 82d84b0f2b * Index lexemes by orth, instead of a lexemes vector. Breaks the mechanism for deciding not to own LexemeC structs during parsing. Need to reinstate this. 2015-07-18 22:42:15 +02:00
Matthew Honnibal 4dddc8a69b * Fix type declarations for attr_t. Remove unused id_t. 2015-07-18 22:39:57 +02:00
Matthew Honnibal db9dfd2e23 * Major refactor of serialization. Nearly complete now. 2015-07-17 01:27:54 +02:00
Matthew Honnibal af5cc926a4 * Add codec property to Vocab, to use the Huffman encoding 2015-07-13 13:55:14 +02:00
Matthew Honnibal abc43b852d * Add pos_tags attr to Vocab. 2015-07-08 12:36:38 +02:00
Matthew Honnibal c04e6ebca6 * Allow user to load different sized vectors. 2015-06-05 16:26:39 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal d11c1edf8c * Import slice_unicode from strings.pyx 2014-12-20 07:56:26 +11:00
Matthew Honnibal 116f7f3bc1 * Rename Lexicon to Vocab, and move it to its own file 2014-12-20 06:54:03 +11:00