Commit Graph

114 Commits

Author SHA1 Message Date
Matthew Honnibal cc8bf62208 * Fix Issue #360: Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens. 2016-05-09 13:23:47 +02:00
Matthew Honnibal b4bfc6ae55 * Add test for Issue #351: Indices off when leading whitespace 2016-05-04 15:53:17 +02:00
Matthew Honnibal 6f82065761 * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
Matthew Honnibal 04d0209be9 * Recognise multiple infixes in a token. 2016-04-13 18:38:26 +10:00
Matthew Honnibal b1fe41b45d * Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment. 2016-03-29 14:31:05 +11:00
Matthew Honnibal 9c73983bdd * Add test for hyphenation problem in Issue #302 2016-03-29 14:27:13 +11:00
Henning Peters c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Henning Peters 9d8966a2c0 Update test_tokenizer.py 2016-02-10 19:24:37 +01:00
Matthew Honnibal 7f24229f10 * Don't try to pickle the tokenizer 2016-02-06 14:09:05 +01:00
Matthew Honnibal 515493c675 * Add xfail test for Issue #225: tokenization with non-whitespace delimiters 2016-01-19 13:20:14 +01:00
Matthew Honnibal 223d2b3484 * Add test for Issue #154: Additional whitespace introduced when string ends with a whitespace token. 2016-01-16 17:08:07 +01:00
Matthew Honnibal 3fbfba575a * xfail the contractions test 2015-12-31 13:16:28 +01:00
Matthew Honnibal 4b4eec8b47 * Fix Issue #201: Tokenization of there'll 2015-12-29 18:09:09 +01:00
Matthew Honnibal 4e16f9e435 * Move tests underneath spacy/ 2015-10-26 00:07:31 +11:00