Matthew Honnibal
|
eab2376547
|
* Allow longer ellipses to be treated as a single token, e.g. Hello......there
|
2016-05-09 13:22:53 +02:00 |
Matthew Honnibal
|
6f82065761
|
* Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases.
|
2016-04-14 11:36:03 +02:00 |
Matthew Honnibal
|
910a6c805f
|
* Add infix rule for double hyphens, re Issue #302
|
2016-03-29 13:03:44 +11:00 |
Matthew Honnibal
|
454c1996d0
|
* Add tokenizer rule to fix numeric range tokenization
|
2015-10-17 15:49:51 +11:00 |
Matthew Honnibal
|
3cfe3d8c1c
|
* Revert bad infix change
|
2015-07-26 17:32:37 +02:00 |
Matthew Honnibal
|
bd608559bc
|
* Fix infix-period tokenization
|
2015-07-26 17:14:52 +02:00 |
Matthew Honnibal
|
94f314c271
|
* Fix tokenization of email addresses.
|
2015-07-26 16:38:08 +02:00 |
Matthew Honnibal
|
14e9e6ec6c
|
* Fix ... tokenization, and correct orth inconsistencies in specials.json
|
2015-07-20 12:10:56 +02:00 |
Matthew Honnibal
|
b5b869366b
|
* Adjust hyphenation rule in tokenizer
|
2015-06-28 06:18:58 +02:00 |
Matthew Honnibal
|
45ec92243a
|
* Add hyphenation rule to infix.txt for tokenizer
|
2015-06-06 05:56:00 +02:00 |
Matthew Honnibal
|
5e27bd0c4c
|
* Add en language data, for tokenizer etc
|
2015-02-25 17:10:32 -05:00 |