Commit Graph

145 Commits

Author SHA1 Message Date
Søren Lind Kristiansen 056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen 8dc265ac0c Add test for tokenization of 'i.' for Danish. 2017-11-24 11:29:37 +01:00
Vadim Mazaev 81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
ines 17849dee4b Fix French test (see #1617) 2017-11-20 13:59:59 +01:00
Matthew Honnibal 63c6ae4191 Fix lemmatizer test 2017-11-06 11:57:06 +01:00
Matthew Honnibal 144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal d6e831bf89 Fix lemmatizer tests 2017-11-03 19:46:34 +01:00
Jim O'Regan 08b0bfd153 merge 2017-10-31 22:55:59 +00:00
Jim O'Regan 00ecfa5417 Ó, not O 2017-10-31 22:54:42 +00:00
Ines Montani 25b1d6cd91
Fix syntax error 2017-10-31 22:36:03 +01:00
Jim O'Regan fe4b10346a replace example sentence until I get around to adding a punctuation.py 2017-10-31 20:24:53 +00:00
Jim O'Regan d4a8160c36 change quotes 2017-10-31 15:15:44 +00:00
Jim O'Regan 41dd29e48e merge 2017-10-31 14:07:45 +00:00
Ines Montani facf77e541 Merge branch 'develop' into support-danish 2017-10-24 11:53:19 +02:00
ines cd6a29dce7 Port over changes from #1294 2017-10-14 13:28:46 +02:00
ines 38c756fd85 Port over changes from #1287 2017-10-14 13:16:21 +02:00
ines 612224c10d Port over changes from #1157 2017-10-14 13:11:39 +02:00
Matthew Honnibal cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
ines 453c47ca24 Add German lemmatizer tests 2017-10-11 13:27:26 +02:00
Matthew Honnibal c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Wannaphong Phatthiyaphaibun 5cba67146c add thai in spacy2 2017-09-26 21:36:27 +07:00
ines ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
Jim O'Regan 187be6d372 copy/paste error 2017-09-11 09:33:17 +01:00
Jim O'Regan c283e9edfe first stab at test 2017-09-11 08:57:48 +01:00
Matthew Honnibal d5fbf27335 Fix test 2017-09-04 16:45:11 +02:00
Matthew Honnibal 644d6c9e1a Improve lemmatization tests, re #1296 2017-09-04 15:17:44 +02:00
Jim Geovedi fbc62a09c7 added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
Jim Geovedi cc4772cac2 reworks 2017-08-03 13:08:38 +07:00
Jim Geovedi 783f7d8b86 added test set for Indonesian language 2017-07-29 18:21:07 +07:00
mollerhoj e840077601 Add some basic tests for Danish 2017-07-03 15:49:51 +02:00
ines cc9c5dc7a3 Fix noun chunks test 2017-06-05 16:39:04 +02:00
ines a0f4592f0a Update tests 2017-06-05 02:26:13 +02:00
ines 3e105bcd36 Update tests 2017-06-05 02:09:27 +02:00
Matthew Honnibal 58be0e1f6f Update tests 2017-06-04 16:35:06 -05:00
Ines Montani 112c5787eb Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines e47eef5e03 Update German tokenizer exceptions and tests 2017-06-03 21:07:44 +02:00
ines d77c2cc8bb Add tests for English norm exceptions 2017-06-03 20:59:50 +02:00
Gyorgy Orosz f0c3b09242 More robust Hungarian tokenizer. 2017-05-31 22:28:40 +02:00
ines 20a7003c0d Update model fixtures and reorganise tests 2017-05-29 22:14:31 +02:00
ines d0c6d4f76d Fix formatting 2017-05-23 11:32:00 +02:00
ines 2c3bdd09b1 Add English test for like_num 2017-05-09 11:06:34 +02:00
ines 22375eafb0 Fix and merge attrs and lex_attrs tests 2017-05-09 11:06:25 +02:00
ines c714841cc8 Move language-specific tests to tests/lang 2017-05-09 00:02:37 +02:00
ines 3c0f85de8e Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00