Commit Graph

13 Commits

Author SHA1 Message Date
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal 141639ea3a * Fix bug in tokenizer that caused new tokens to be added for affixes 2016-02-21 23:17:47 +00:00
Chris DuBois dac8fe7bdb Add __reduce__ to Tokenizer so that English pickles.
- Add tests to test_pickle and test_tokenizer that save to tempfiles.
2015-10-23 22:24:03 -07:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 119c0f8c3f * Hack out morphology stuff from tokenizer, while morphology being reimplemented. 2015-08-26 19:20:11 +02:00
Matthew Honnibal 109106a949 * Replace UniStr, using unicode objects instead 2015-07-22 04:52:05 +02:00
Matthew Honnibal cfd842769e * Allow infix tokens to be variable length 2015-07-18 22:45:00 +02:00
Matthew Honnibal 67641f3b58 * Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string 2015-07-13 21:46:02 +02:00
Matthew Honnibal 6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
Matthew Honnibal bb522496dd * Rename Tokens to Doc 2015-07-08 18:53:00 +02:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal a60ae261ae * Move tokenizer to its own file, and refactor 2014-12-20 07:29:16 +11:00