Commit Graph

14 Commits

Author SHA1 Message Date
Matthew Honnibal 302e09018b * Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas 2014-12-09 14:48:01 +11:00
Matthew Honnibal 0de700b566 * Comment out tests of hyphenation, while we decide what hyphenation policy should be. 2014-11-05 02:03:22 +11:00
Matthew Honnibal 63114820cf * Upd tests for tighter interface 2014-10-30 18:15:30 +11:00
Matthew Honnibal 13909a2e24 * Rewriting Lexeme serialization. 2014-10-29 23:19:38 +11:00
Matthew Honnibal 08ce602243 * Large refactor, particularly to Python API 2014-10-24 00:59:17 +11:00
Matthew Honnibal 6fb42c4919 * Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang 2014-10-14 16:17:45 +11:00
Matthew Honnibal db191361ee * Add new tests for fancier tokenization cases 2014-09-15 06:31:58 +02:00
Matthew Honnibal 5dcc1a426a * Update tokenization tests for new tokenizer rules 2014-09-15 01:32:51 +02:00
Matthew Honnibal 985bc68327 * Fix bug with trailing punct on contractions. Reduced efficiency, and slightly hacky implementation. 2014-09-12 18:26:26 +02:00
Matthew Honnibal b5b31c6b6e * Avoid testing for object identity 2014-09-10 20:58:30 +02:00
Matthew Honnibal c282e6d5fb * Redesign proceeding 2014-08-28 19:45:09 +02:00
Matthew Honnibal 9815c7649e * Refactor around Word objects, adapting tests. Tests passing, except for string views. 2014-08-23 19:55:06 +02:00
Matthew Honnibal 01469b0888 * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. 2014-08-18 19:14:00 +02:00
Matthew Honnibal e4263a241a * Tests passing for reorganized version 2014-07-07 04:23:46 +02:00