Commit Graph

591 Commits

Author SHA1 Message Date
Juan Miguel Cejuela a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Matthew Honnibal 34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal 0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
ines 376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines 7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines 51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal 8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines 67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines 44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines 3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines 85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque 06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque 3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
ines 21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque 309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque 4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
ines 654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
Michael Wallin 35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
Michael Wallin 1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Ines Montani afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani 13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque 85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Ines Montani e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Ines Montani 19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Raphaël Bournhonesque 1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Ines Montani 0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani 5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Matthew Honnibal 2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Ines Montani 50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani 116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz 92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Ines Montani 332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz 9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz 63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz 1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani 3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani 49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani 138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani 96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani 62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani 38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani 4bb5b89ee4 Add text_file_b fixture using BytesIO 2017-01-13 02:23:50 +01:00
Ines Montani 49febd8c62 Modernise noun chunks tests and don't depend on models 2017-01-13 02:01:00 +01:00
Ines Montani 3ee97b5686 Rename test_parser to test_noun_chunks 2017-01-13 01:36:33 +01:00
Ines Montani a308703f47 Remove old tests 2017-01-13 01:34:48 +01:00
Ines Montani 12eb8edf26 Move parser tests from unit to parser 2017-01-13 01:34:38 +01:00
Ines Montani 138c53ff2e Merge tokenizer tests 2017-01-13 01:34:14 +01:00
Ines Montani 01f36ca3ff Move attrs tests from unit to root and modernise 2017-01-13 01:33:50 +01:00
Ines Montani 3610d27967 Move alignment tests from munge to gold and modernise 2017-01-13 01:33:31 +01:00
Ines Montani 094ff7396a Reformat and rename Pragmatic Segmenter tests and mark xfails 2017-01-13 01:30:20 +01:00
Ines Montani affcf1b19d Modernise lemmatizer tests 2017-01-12 23:41:17 +01:00
Ines Montani 33d9cf87f9 Modernise tagger tests and fix xpassing test 2017-01-12 23:40:52 +01:00
Ines Montani 33e5f8dc2e Create basic and extended test set for URLs 2017-01-12 23:40:02 +01:00
Ines Montani 5e4f5ebfc8 Modernise BILUO tests 2017-01-12 23:39:18 +01:00
Ines Montani 09acfbca01 Add Lemmatizer fixture 2017-01-12 23:38:55 +01:00
Ines Montani 514bfa2597 Add path fixture for spaCy data path 2017-01-12 23:38:47 +01:00
Ines Montani e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani 6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani 5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani d5d774413a Update comments on EN and DE fixtures 2017-01-12 22:03:07 +01:00
Ines Montani 9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00
Ines Montani 5e1b6178e3 Fix formatting and consistency 2017-01-12 22:00:06 +01:00
Ines Montani a3fd32455e Remove redundant language loading integration tests 2017-01-12 21:59:48 +01:00
Ines Montani 61f1ca09c2 Modernise serializer codecs tests 2017-01-12 21:58:55 +01:00
Ines Montani 5dbc6e59f6 Modernise Huffman tests 2017-01-12 21:58:40 +01:00
Ines Montani edeeeccea5 Modernise packer tests and don't depend on models where possible 2017-01-12 21:58:07 +01:00
Ines Montani d084676cd0 Modernise and merge serialization tests 2017-01-12 21:57:19 +01:00
Ines Montani 442237787c Add assert_docs_equal util to compare two docs 2017-01-12 21:56:52 +01:00
Ines Montani eac3f700fb Add fixture for entity recognizer 2017-01-12 21:56:32 +01:00
Ines Montani b438cfddbc Modernise matcher tests and split into two files 2017-01-12 17:51:46 +01:00
Ines Montani 27482ebed8 Move matcher tests for #188 and #242 to regression tests
Modernise tests and remove unnecessary imports
2017-01-12 17:33:57 +01:00
Ines Montani 0a4dc632bd Update test to not create redundant Doc object 2017-01-12 17:33:18 +01:00
Ines Montani a2526e66d8 Fix formatting, naming and unicode declaration 2017-01-12 16:51:13 +01:00
Ines Montani 052cdff07d Modernise vector similarity tests 2017-01-12 16:51:13 +01:00
Ines Montani bd20ec0a6a Add get_cosine util function 2017-01-12 16:51:13 +01:00
Ines Montani 51ef75f629 Fix regression test for #615 and remove unnecessary imports 2017-01-12 16:51:12 +01:00
Ines Montani aeb747e10c Adjust formatting 2017-01-12 16:51:12 +01:00
Ines Montani 8e3e58a7e6 Modernise and merge lexeme vocab tests 2017-01-12 16:51:12 +01:00
Ines Montani c3d4516fc2 Move test for #361 to regression tests 2017-01-12 16:51:12 +01:00
Ines Montani 7cb3d74426 Modernise span tests and don't depend on models 2017-01-12 15:30:49 +01:00
Ines Montani 92e3d8b3ee Modernise vocab API tests and remove old xfailing tests 2017-01-12 15:27:46 +01:00
Ines Montani 7ea87684cd Rename test_vocab.py to test_vocab_api.py 2017-01-12 15:12:21 +01:00
Ines Montani 0da2ee5c68 Merge flag features tests into orth tests in tests root 2017-01-12 15:12:00 +01:00
Ines Montani 03c136cfd3 Remove StringStore tests from vocab tests 2017-01-12 15:11:15 +01:00
Ines Montani d7bd57abdf Modernise add vectors vocab test 2017-01-12 15:09:49 +01:00
Ines Montani 89525ef345 Use consistent test names 2017-01-12 15:09:21 +01:00
Ines Montani f8803808ce Remove old unused tests and conftest files 2017-01-12 15:09:05 +01:00
Ines Montani 4d0bfebcd9 Move Pragmatic Segmenter test cases (currently unused) to parser tests 2017-01-12 15:08:02 +01:00
Ines Montani 26d018d874 Add tests for StringStore 2017-01-12 15:07:31 +01:00
Ines Montani 9b6784bab5 Add fixture for StringStore 2017-01-12 15:05:40 +01:00
Ines Montani 99d66d613a Modernise tests for merging spans and don't depend on models 2017-01-12 12:26:26 +01:00
Ines Montani fa8f67596d Remove unused old test 2017-01-12 12:26:08 +01:00
Ines Montani 359f73a96b Move test for #54 to regression tests 2017-01-12 12:25:51 +01:00
Ines Montani 3f3a46722c Remove unused conftest 2017-01-12 12:25:24 +01:00
Ines Montani c2406e92bc Allow setting ents in get_doc 2017-01-12 12:25:10 +01:00
Ines Montani c5914c6fe5 Fix and pass regression test for #736 2017-01-12 11:48:56 +01:00
Ines Montani a6790b6694 Rename tags to pos in get_doc and allow adding tags to tokens 2017-01-12 11:18:36 +01:00
Ines Montani 1add8ace67 Merge lemmatizer tests 2017-01-12 11:16:53 +01:00
Ines Montani 3bc082abdf Modernise morph exceptions test and don't depend on models 2017-01-12 11:14:29 +01:00
Ines Montani ec7739b76e Add regression test for #736 2017-01-12 11:12:44 +01:00
Ines Montani 6c1c564891 Move language-specific tests out of redundant tokenizer directories 2017-01-12 02:17:18 +01:00
Ines Montani 8fecedac3a Tidy up 2017-01-12 02:16:37 +01:00
Ines Montani ae7edd30e7 Move text file back to tokenizer tests directory 2017-01-12 02:10:23 +01:00
Ines Montani ffcaba9017 Remove old and/or redundant tests 2017-01-12 02:10:18 +01:00
Ines Montani 19c4132097 Modernise space attachment parser tests and don't depend on models 2017-01-12 01:54:44 +01:00
Ines Montani 69778924c8 Modernise and merge parser tests and don't depend on models 2017-01-12 01:07:29 +01:00
Ines Montani 178c147612 Modernise nonprojectivity tests and don't depend on models 2017-01-12 01:06:36 +01:00
Ines Montani 1a3984742c Modernise sentence boundary detection tests and don't depend on models (where possible) 2017-01-11 23:53:08 +01:00
Ines Montani 0cdb6ea61d Remove old unused pickle test 2017-01-11 23:52:28 +01:00
Ines Montani c9671329dc Move test for #309 to regression tests 2017-01-11 23:52:13 +01:00
Ines Montani d0e37b5670 Modernise parser tests and don't depend on models 2017-01-11 21:30:27 +01:00
Ines Montani 342cb41782 Add apply_transition_sequence util function to utils 2017-01-11 21:30:14 +01:00
Ines Montani 09807addff Add en_parser fixture 2017-01-11 21:29:59 +01:00
Ines Montani 55d151aa61 Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 21:14:15 +01:00
Ines Montani 7262421bb2 Use consistent test names 2017-01-11 19:00:52 +01:00
Ines Montani 33800c9367 Rename "tokens" tests to "doc" 2017-01-11 18:59:01 +01:00
Ines Montani 3a9c6a9563 Remove old unused files 2017-01-11 18:58:38 +01:00
Ines Montani 8e962de39f Remove old word vector tests 2017-01-11 18:55:08 +01:00
Ines Montani e027936920 Modernise Doc noun chunks tests 2017-01-11 18:54:56 +01:00
Ines Montani 439f396acd Modernise Doc array tests and don't depend on models 2017-01-11 18:54:46 +01:00
Ines Montani 05447be884 Modernise test for adding entities 2017-01-11 18:54:24 +01:00
Ines Montani 6e883f4c00 Modernise Doc API tests and don't depend on models 2017-01-11 18:05:36 +01:00
Ines Montani 8bf3bb5c44 Make words optional for get_doc 2017-01-11 18:05:10 +01:00
Ines Montani 928db7e419 Fix StringIO import for Python 3 2017-01-11 14:07:48 +01:00
Ines Montani 69998f216b Rename test_tokens_api.py to test_doc_api.py 2017-01-11 13:58:56 +01:00
Ines Montani d94dea1b18 Merge token tests into token API tests 2017-01-11 13:57:02 +01:00
Ines Montani eb23424ab0 Modernise token API tests and don't depend on loading models 2017-01-11 13:56:54 +01:00
Ines Montani c682b8ca90 Merge conftests into one cohesive file 2017-01-11 13:56:32 +01:00
Ines Montani 909f24d7df Add test utils and get_doc helper function
Create Doc object from given vocab, words and annotations to allow
tests not to depend on loading the models.
2017-01-11 13:55:33 +01:00
Ines Montani 3e6e1f0251 Tidy up regression tests 2017-01-10 19:24:10 +01:00
Ines Montani 869963c3c4 Mark extensive prefix/suffix tests as slow 2017-01-10 15:57:35 +01:00
Ines Montani 487e020ebe Add simple test for surrounding brackets 2017-01-10 15:57:26 +01:00
Ines Montani 0ba5cf51d2 Assert length first 2017-01-10 15:57:00 +01:00
Ines Montani 2185d31907 Adjust names and formatting 2017-01-10 15:56:35 +01:00
Ines Montani e10d4ca964 Remove semi-redundant URLs and punctuation for faster testing 2017-01-10 15:54:25 +01:00
Ines Montani 3a3cb2c90c Add unicode declaration 2017-01-10 15:53:15 +01:00
Matthew Honnibal 64f747cb65 Token comparison test 2017-01-09 19:12:00 +01:00
Matthew Honnibal 18c3c2d05c Add tests for token comparison, re Issue #631 2017-01-09 19:09:59 +01:00
Matthew Honnibal 42cd598f57 Use correct fixtures in URL tokenizer 2017-01-09 14:10:40 +01:00
Ines Montani aa876884f0 Revert "Revert "Merge remote-tracking branch 'origin/master'""
This reverts commit fb9d3bb022.
2017-01-09 13:28:13 +01:00
Ines Montani d5c72c40eb Remove old tests for old website example code 2017-01-08 22:28:53 +01:00
Ines Montani 5d28664fc5 Don't test Hungarian for numbers and hyphens for now
Reinvestigate behaviour of case affixes given reorganised tokenizer
patterns.
2017-01-08 20:45:40 +01:00
Ines Montani abb09782f9 Move sun.txt to original location and fix path to not break parser tests 2017-01-08 20:32:54 +01:00
Ines Montani 8328925e1f Add newlines to long German text 2017-01-05 18:13:30 +01:00
Ines Montani 55b46d7cf6 Add tokenizer tests for German 2017-01-05 18:11:25 +01:00
Ines Montani 5bb4081f52 Remove redundant test_tokenizer.py for English 2017-01-05 18:11:11 +01:00
Ines Montani 8216ba599b Add tests for longer and mixed English texts 2017-01-05 18:11:04 +01:00
Ines Montani 65f937d5c6 Move basic contraction tests to test_contractions.py 2017-01-05 18:09:53 +01:00
Ines Montani bbe7cab3a1 Move non-English-specific tests back to general tokenizer tests 2017-01-05 18:09:29 +01:00
Ines Montani 038002d616 Reformat HU tokenizer tests and adapt to general style
Improve readability of test cases and add conftest.py with fixture
2017-01-05 18:06:44 +01:00
Ines Montani 637f785036 Add general sanity tests for all tokenizers 2017-01-05 16:25:38 +01:00
Ines Montani c5f2dc15de Move English tokenizer tests to directory /en 2017-01-05 16:25:04 +01:00
Ines Montani 8b45363b4d Modernize and merge general tokenizer tests 2017-01-05 13:17:05 +01:00
Ines Montani 02cfda48c9 Modernize and merge tokenizer tests for string loading 2017-01-05 13:16:55 +01:00
Ines Montani a11f684822 Modernize and merge tokenizer tests for whitespace 2017-01-05 13:16:33 +01:00
Ines Montani 8b284fc6f1 Modernize and merge tokenizer tests for text from file 2017-01-05 13:15:52 +01:00
Ines Montani 2c2e878653 Modernize and merge tokenizer tests for punctuation 2017-01-05 13:14:16 +01:00
Ines Montani 8a74129cdf Modernize and merge tokenizer tests for prefixes/suffixes/infixes 2017-01-05 13:13:12 +01:00
Ines Montani 0e65dca9a5 Modernize and merge tokenizer tests for exception and emoticons 2017-01-05 13:11:31 +01:00
Ines Montani 34c47bb20d Fix formatting 2017-01-05 13:10:51 +01:00
Ines Montani 2e72683baa Add missing docstrings 2017-01-05 13:10:21 +01:00
Ines Montani da10a049a6 Add unicode declarations 2017-01-05 13:09:48 +01:00
Ines Montani 58adae8774 Remove unused file 2017-01-05 13:09:22 +01:00
Ines Montani c6e5a5349d Move regression test for #360 into own file 2017-01-04 00:49:31 +01:00
Ines Montani 8279993a6f Modernize and merge tokenizer tests for punctuation 2017-01-04 00:49:20 +01:00
Ines Montani 550630df73 Update tokenizer tests for contractions 2017-01-04 00:48:42 +01:00
Ines Montani 109f202e8f Update conftest fixture 2017-01-04 00:48:21 +01:00
Ines Montani ee6b49b293 Modernize tokenizer tests for emoticons 2017-01-04 00:47:59 +01:00
Ines Montani f09b5a5dfd Modernize tokenizer tests for infixes 2017-01-04 00:47:42 +01:00
Ines Montani 59059fed27 Move regression test for #351 to own file 2017-01-04 00:47:11 +01:00
Ines Montani 667051375d Modernize tokenizer tests for whitespace 2017-01-04 00:46:35 +01:00
Ines Montani aafc894285 Modernize tokenizer tests for contractions
Use @pytest.mark.parametrize.
2017-01-03 23:02:21 +01:00
Ines Montani fb9d3bb022 Revert "Merge remote-tracking branch 'origin/master'"
This reverts commit d3b181cdf1, reversing
changes made to b19cfcc144.
2017-01-03 18:21:36 +01:00
Matthew Honnibal 3ba7c167a8 Fix URL tests 2016-12-30 17:10:08 -06:00
Matthew Honnibal 9936a1b9b5 Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns 2016-12-30 14:53:40 -06:00
kengz 73a38bd4d1 Merge remote-tracking branch 'upstream/master' 2016-12-30 12:19:59 -05:00
kengz da44183ae1 move parse_tree logic to a new tokens/printers.py file 2016-12-30 12:19:18 -05:00
Matthew Honnibal 3e8d9c772e Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
2016-12-31 00:52:17 +11:00
Gyorgy Orosz 45e045a87b Unicode/UTF8 compatibility for Python2 2016-12-24 00:21:00 +01:00
Gyorgy Orosz 72b61b6d03 Typo fix. 2016-12-24 00:10:29 +01:00
Gyorgy Orosz 1748549aeb Added exception pattern mechanism to the tokenizer. 2016-12-21 23:16:19 +01:00
Gyorgy Orosz ab2f6ea46c Removed data files from tests.. 2016-12-21 20:22:09 +01:00
Gyorgy Orosz 3d5306acb9 Added further testcases. 2016-12-20 23:49:35 +01:00
Gyorgy Orosz 23956e72ff Improved partial support for tokenzing Hungarian numbers 2016-12-20 23:36:59 +01:00
Gyorgy Orosz 6add156075 Refactored language data structure 2016-12-20 22:28:20 +01:00