Commit Graph

505 Commits

Author SHA1 Message Date
Matthew Honnibal ea2592879f Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-11 11:13:37 -06:00
Em 426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
ines 10e29189ac Adjust URL testcases and xfail problems (instead of comment) 2017-03-10 14:22:50 +01:00
Matthew Honnibal ea53647362 Merge branch 'develop' 2017-03-10 02:49:39 -06:00
Dan Rapp 123d3f2d38 Fix error in test case parameterization 2017-03-09 12:18:21 -07:00
Dan Rapp b9307dfcd7 Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix 2017-03-09 11:42:14 -07:00
Dan Rapp 3b1df3808d Issue #840 - URL pattenr too broad 2017-03-09 11:39:39 -07:00
Matthew Honnibal 5b0b968d13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 15:03:10 +01:00
Matthew Honnibal 0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal 16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal 3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal 5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal 4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal 6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Aniruddha Adhikary 696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
ines 8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Matthew Honnibal 34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal 0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
ines 376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines 7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines 51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal 8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines 67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines 44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines 3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines 85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque 06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque 3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
ines 21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque 309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque 4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
ines 654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
Michael Wallin 35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
Michael Wallin 1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Ines Montani afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani 13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque 85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Ines Montani e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Ines Montani 19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Raphaël Bournhonesque 1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Ines Montani 0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani 5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Matthew Honnibal 2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Ines Montani 50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani 116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz 92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Ines Montani 332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz 9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz 63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz 1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani 3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani 49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani 138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani 96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani 62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani 38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani 4bb5b89ee4 Add text_file_b fixture using BytesIO 2017-01-13 02:23:50 +01:00
Ines Montani 49febd8c62 Modernise noun chunks tests and don't depend on models 2017-01-13 02:01:00 +01:00
Ines Montani 3ee97b5686 Rename test_parser to test_noun_chunks 2017-01-13 01:36:33 +01:00
Ines Montani a308703f47 Remove old tests 2017-01-13 01:34:48 +01:00
Ines Montani 12eb8edf26 Move parser tests from unit to parser 2017-01-13 01:34:38 +01:00
Ines Montani 138c53ff2e Merge tokenizer tests 2017-01-13 01:34:14 +01:00
Ines Montani 01f36ca3ff Move attrs tests from unit to root and modernise 2017-01-13 01:33:50 +01:00
Ines Montani 3610d27967 Move alignment tests from munge to gold and modernise 2017-01-13 01:33:31 +01:00
Ines Montani 094ff7396a Reformat and rename Pragmatic Segmenter tests and mark xfails 2017-01-13 01:30:20 +01:00
Ines Montani affcf1b19d Modernise lemmatizer tests 2017-01-12 23:41:17 +01:00
Ines Montani 33d9cf87f9 Modernise tagger tests and fix xpassing test 2017-01-12 23:40:52 +01:00
Ines Montani 33e5f8dc2e Create basic and extended test set for URLs 2017-01-12 23:40:02 +01:00
Ines Montani 5e4f5ebfc8 Modernise BILUO tests 2017-01-12 23:39:18 +01:00
Ines Montani 09acfbca01 Add Lemmatizer fixture 2017-01-12 23:38:55 +01:00
Ines Montani 514bfa2597 Add path fixture for spaCy data path 2017-01-12 23:38:47 +01:00
Ines Montani e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani 6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani 5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani d5d774413a Update comments on EN and DE fixtures 2017-01-12 22:03:07 +01:00
Ines Montani 9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00
Ines Montani 5e1b6178e3 Fix formatting and consistency 2017-01-12 22:00:06 +01:00
Ines Montani a3fd32455e Remove redundant language loading integration tests 2017-01-12 21:59:48 +01:00
Ines Montani 61f1ca09c2 Modernise serializer codecs tests 2017-01-12 21:58:55 +01:00
Ines Montani 5dbc6e59f6 Modernise Huffman tests 2017-01-12 21:58:40 +01:00
Ines Montani edeeeccea5 Modernise packer tests and don't depend on models where possible 2017-01-12 21:58:07 +01:00