Commit Graph

2392 Commits

Author SHA1 Message Date
Matthew Honnibal d108534dc2 Fix 2/3 problems for training 2017-03-08 01:37:52 +01:00
Matthew Honnibal d03d6a13f1 Merge branch 'rominf-ud20' into develop 2017-03-07 21:48:56 +01:00
Matthew Honnibal f7374d0b86 Merge branch 'ud20' of https://github.com/rominf/spaCy into rominf-ud20 2017-03-07 21:48:37 +01:00
Matthew Honnibal 16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal d814892805 Hackish pickle support for Vocab. 2017-03-07 20:25:12 +01:00
Matthew Honnibal 26614e028f Add hacky support for StringCFile, to make pickling easier. 2017-03-07 20:24:37 +01:00
Matthew Honnibal 3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal 5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal 4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal 6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Aniruddha Adhikary 696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
Aniruddha Adhikary 8f3bfe9bfc [Bengali] basic tag map, morph, lemma rules and exceptions 2017-03-04 12:36:59 +06:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
ines 8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
ines c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Aniruddha Adhikary d91be7aed4 add punctuations for Bengali 2017-02-28 21:07:14 +06:00
Aniruddha Adhikary 5a4fc09576 add basic Bengali support 2017-02-28 07:48:37 +06:00
Matthew Honnibal cc9b2b74e3 Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:44:39 +01:00
Matthew Honnibal bd4375a2e6 Remove comment 2017-02-27 11:44:26 +01:00
Matthew Honnibal e7e22d8be6 Move import within get_exceptions() function, to speed import 2017-02-27 11:34:48 +01:00
Matthew Honnibal 34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal 0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
Matthew Honnibal 26446aa728 Avoid loading all French exceptions on import
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00
ines 376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines 7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines 0e2e331b58 Convert exceptions to Python list 2017-02-24 18:22:40 +01:00
ines 51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal 8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines 67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines 30ce2a6793 Exclude "shed" and "Shed" from tokenizer exceptions (see #847) 2017-02-18 14:10:44 +01:00
Ines Montani de997c1a33 Merge pull request #842 from magnusburton/master
Added regular verb rules for Swedish
2017-02-17 11:18:20 +01:00
Magnus Burton 41fcfd06b8 Added regular verb rules for Swedish 2017-02-17 10:04:04 +01:00
ines aa92d4e9b5 Fix unicode regex for Python 2 (see #834) 2017-02-16 23:49:54 +01:00
ines 44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines 3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines 85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque 06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque 3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
Raphaël Bournhonesque e17dc2db75 Remove useless import 2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque 3fd2742649 load_vectors should accept arbitrary space characters as word tokens
Fix bug  #834
2017-02-16 12:08:30 +01:00
ines f08e180a47 Make groups non-capturing
Prevents hitting the 100 named groups limit in Python
2017-02-10 13:35:02 +01:00
ines fa3b8512da Use consistent imports and exports
Bundle everything in language_data to keep it consistent with other
languages and make TOKENIZER_EXCEPTIONS importable from there.
2017-02-10 13:34:09 +01:00
ines 21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque 309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque 4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
Raphaël Bournhonesque 5d706ab95d Merge tokenizer exceptions from PR #802 2017-02-09 16:30:28 +01:00