Commit Graph

643 Commits

Author SHA1 Message Date
Matthew Honnibal fe442cac53 Fix #717: Set correct lemma for contracted verbs 2017-03-18 16:16:10 +01:00
ines ad934a9abd Add regression test for #693 2017-03-18 16:12:30 +01:00
ines f57c616830 Add regression test for #704 and test new model (resolves #704)
(using new English model)
2017-03-18 16:04:14 +01:00
Matthew Honnibal 413138de79 Fix #719: Lemmatizer can no longer output empty string 2017-03-18 16:02:06 +01:00
ines ab1451f997 Don't mark compatibility test as slow 2017-03-18 15:17:39 +01:00
ines ec3e810662 Add directory cli and set up command line interface 2017-03-18 15:14:48 +01:00
Matthew Honnibal 6420f86f02 Merge changes to __init__.py 2017-03-17 19:51:45 +01:00
ines 0e533ad0cc Mark compatibility table test as slow (temporary)
Prevent Travis from running test test until models repo is published
2017-03-17 13:11:36 +01:00
Matthew Honnibal a630726b13 Fix typo in tests 2017-03-16 20:50:36 -05:00
Matthew Honnibal f98b30583f Fix tests 2017-03-16 19:48:00 -05:00
Matthew Honnibal db51abf685 Fix tests 2017-03-16 18:53:47 -05:00
Matthew Honnibal fea9fe08af Merge pull request #866 from juanmirocks/master
Fix lemmatization of OOV words
2017-03-16 23:37:36 +01:00
Matthew Honnibal 28bb546939 Merge pull request #883 from ericzhao28/master
Add `lower_` and `upper_` properties to `Span` class
2017-03-16 23:35:47 +01:00
Matthew Honnibal 8843b84bd1 Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 12:00:42 -05:00
ines 4cfc8ffbd2 Reformat pickle tests 2017-03-15 17:39:54 +01:00
ines 2a0fcf1354 Add tests for new download module 2017-03-15 17:39:43 +01:00
Matthew Honnibal 4cab8ac136 Update morph exceptions test 2017-03-15 09:31:34 -05:00
ines 42ba740dde Revert "Merge branch 'debug'"
This reverts commit 89b79d1178, reversing
changes made to 02bdf490a1.
2017-03-13 20:11:52 +01:00
ines 4c5f51e49e Update regression test 2017-03-13 15:16:11 +01:00
ines 02bdf490a1 Remove regression test to see if it caused pytest Travis error 2017-03-13 13:00:22 +01:00
ines 17018750ac Add regression test for #717 2017-03-13 12:58:22 +01:00
ines 2883ebfca2 Remove print statement 2017-03-13 12:30:42 +01:00
ines 98c13d8aa9 Add regression test for #401 2017-03-13 12:28:41 +01:00
ines 444d665f9d Add regression test for #686 2017-03-13 12:23:35 +01:00
ines 46b17e5b51 Add regression test for #719 2017-03-13 12:17:35 +01:00
ines c8ae682ff9 Add regression test for #636 2017-03-13 12:08:31 +01:00
ines 337f9601f2 Add missing unicode declaration 2017-03-13 12:08:19 +01:00
ines d70386ec6e Update docstring in #886 regression test 2017-03-13 12:00:38 +01:00
ines 51ba3ef0a8 Add regression test for #886 2017-03-13 11:44:58 +01:00
ines 1da29a7146 Use new Lemmatizer data and remove file import
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
ines c89e30d1a3 Add test for English time exceptions ("1a.m." etc.) 2017-03-12 13:58:22 +01:00
ines 66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Em 9c809efc25 Removed mapStr 2017-03-11 16:23:26 -08:00
Matthew Honnibal ea2592879f Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-11 11:13:37 -06:00
Em 426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
ines 10e29189ac Adjust URL testcases and xfail problems (instead of comment) 2017-03-10 14:22:50 +01:00
Matthew Honnibal ea53647362 Merge branch 'develop' 2017-03-10 02:49:39 -06:00
Dan Rapp 123d3f2d38 Fix error in test case parameterization 2017-03-09 12:18:21 -07:00
Dan Rapp b9307dfcd7 Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix 2017-03-09 11:42:14 -07:00
Dan Rapp 3b1df3808d Issue #840 - URL pattenr too broad 2017-03-09 11:39:39 -07:00
Matthew Honnibal 5b0b968d13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 15:03:10 +01:00
Matthew Honnibal 0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal 16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal 3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal 5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal 4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal 6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Aniruddha Adhikary 696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
ines 8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Matthew Honnibal 34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal 0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
ines 376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines 7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines 51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal 8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines 67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines 44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines 3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines 85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque 06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque 3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
ines 21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque 309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque 4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
ines 654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
Michael Wallin 35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
Michael Wallin 1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Ines Montani afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani 13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque 85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Ines Montani e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Ines Montani 19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Raphaël Bournhonesque 1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Ines Montani 0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani 5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Matthew Honnibal 2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Ines Montani 50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani 116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz 92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Ines Montani 332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz 9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz 63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz 1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani 3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani 49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani 138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani 96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani 62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani 38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani 4bb5b89ee4 Add text_file_b fixture using BytesIO 2017-01-13 02:23:50 +01:00
Ines Montani 49febd8c62 Modernise noun chunks tests and don't depend on models 2017-01-13 02:01:00 +01:00
Ines Montani 3ee97b5686 Rename test_parser to test_noun_chunks 2017-01-13 01:36:33 +01:00
Ines Montani a308703f47 Remove old tests 2017-01-13 01:34:48 +01:00
Ines Montani 12eb8edf26 Move parser tests from unit to parser 2017-01-13 01:34:38 +01:00
Ines Montani 138c53ff2e Merge tokenizer tests 2017-01-13 01:34:14 +01:00
Ines Montani 01f36ca3ff Move attrs tests from unit to root and modernise 2017-01-13 01:33:50 +01:00
Ines Montani 3610d27967 Move alignment tests from munge to gold and modernise 2017-01-13 01:33:31 +01:00
Ines Montani 094ff7396a Reformat and rename Pragmatic Segmenter tests and mark xfails 2017-01-13 01:30:20 +01:00
Ines Montani affcf1b19d Modernise lemmatizer tests 2017-01-12 23:41:17 +01:00
Ines Montani 33d9cf87f9 Modernise tagger tests and fix xpassing test 2017-01-12 23:40:52 +01:00
Ines Montani 33e5f8dc2e Create basic and extended test set for URLs 2017-01-12 23:40:02 +01:00
Ines Montani 5e4f5ebfc8 Modernise BILUO tests 2017-01-12 23:39:18 +01:00
Ines Montani 09acfbca01 Add Lemmatizer fixture 2017-01-12 23:38:55 +01:00
Ines Montani 514bfa2597 Add path fixture for spaCy data path 2017-01-12 23:38:47 +01:00
Ines Montani e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani 6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani 5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani d5d774413a Update comments on EN and DE fixtures 2017-01-12 22:03:07 +01:00
Ines Montani 9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00
Ines Montani 5e1b6178e3 Fix formatting and consistency 2017-01-12 22:00:06 +01:00
Ines Montani a3fd32455e Remove redundant language loading integration tests 2017-01-12 21:59:48 +01:00
Ines Montani 61f1ca09c2 Modernise serializer codecs tests 2017-01-12 21:58:55 +01:00
Ines Montani 5dbc6e59f6 Modernise Huffman tests 2017-01-12 21:58:40 +01:00
Ines Montani edeeeccea5 Modernise packer tests and don't depend on models where possible 2017-01-12 21:58:07 +01:00
Ines Montani d084676cd0 Modernise and merge serialization tests 2017-01-12 21:57:19 +01:00
Ines Montani 442237787c Add assert_docs_equal util to compare two docs 2017-01-12 21:56:52 +01:00
Ines Montani eac3f700fb Add fixture for entity recognizer 2017-01-12 21:56:32 +01:00
Ines Montani b438cfddbc Modernise matcher tests and split into two files 2017-01-12 17:51:46 +01:00
Ines Montani 27482ebed8 Move matcher tests for #188 and #242 to regression tests
Modernise tests and remove unnecessary imports
2017-01-12 17:33:57 +01:00
Ines Montani 0a4dc632bd Update test to not create redundant Doc object 2017-01-12 17:33:18 +01:00
Ines Montani a2526e66d8 Fix formatting, naming and unicode declaration 2017-01-12 16:51:13 +01:00
Ines Montani 052cdff07d Modernise vector similarity tests 2017-01-12 16:51:13 +01:00
Ines Montani bd20ec0a6a Add get_cosine util function 2017-01-12 16:51:13 +01:00
Ines Montani 51ef75f629 Fix regression test for #615 and remove unnecessary imports 2017-01-12 16:51:12 +01:00
Ines Montani aeb747e10c Adjust formatting 2017-01-12 16:51:12 +01:00
Ines Montani 8e3e58a7e6 Modernise and merge lexeme vocab tests 2017-01-12 16:51:12 +01:00
Ines Montani c3d4516fc2 Move test for #361 to regression tests 2017-01-12 16:51:12 +01:00
Ines Montani 7cb3d74426 Modernise span tests and don't depend on models 2017-01-12 15:30:49 +01:00
Ines Montani 92e3d8b3ee Modernise vocab API tests and remove old xfailing tests 2017-01-12 15:27:46 +01:00
Ines Montani 7ea87684cd Rename test_vocab.py to test_vocab_api.py 2017-01-12 15:12:21 +01:00
Ines Montani 0da2ee5c68 Merge flag features tests into orth tests in tests root 2017-01-12 15:12:00 +01:00
Ines Montani 03c136cfd3 Remove StringStore tests from vocab tests 2017-01-12 15:11:15 +01:00
Ines Montani d7bd57abdf Modernise add vectors vocab test 2017-01-12 15:09:49 +01:00
Ines Montani 89525ef345 Use consistent test names 2017-01-12 15:09:21 +01:00
Ines Montani f8803808ce Remove old unused tests and conftest files 2017-01-12 15:09:05 +01:00
Ines Montani 4d0bfebcd9 Move Pragmatic Segmenter test cases (currently unused) to parser tests 2017-01-12 15:08:02 +01:00
Ines Montani 26d018d874 Add tests for StringStore 2017-01-12 15:07:31 +01:00
Ines Montani 9b6784bab5 Add fixture for StringStore 2017-01-12 15:05:40 +01:00
Ines Montani 99d66d613a Modernise tests for merging spans and don't depend on models 2017-01-12 12:26:26 +01:00
Ines Montani fa8f67596d Remove unused old test 2017-01-12 12:26:08 +01:00
Ines Montani 359f73a96b Move test for #54 to regression tests 2017-01-12 12:25:51 +01:00
Ines Montani 3f3a46722c Remove unused conftest 2017-01-12 12:25:24 +01:00
Ines Montani c2406e92bc Allow setting ents in get_doc 2017-01-12 12:25:10 +01:00
Ines Montani c5914c6fe5 Fix and pass regression test for #736 2017-01-12 11:48:56 +01:00
Ines Montani a6790b6694 Rename tags to pos in get_doc and allow adding tags to tokens 2017-01-12 11:18:36 +01:00
Ines Montani 1add8ace67 Merge lemmatizer tests 2017-01-12 11:16:53 +01:00
Ines Montani 3bc082abdf Modernise morph exceptions test and don't depend on models 2017-01-12 11:14:29 +01:00
Ines Montani ec7739b76e Add regression test for #736 2017-01-12 11:12:44 +01:00
Ines Montani 6c1c564891 Move language-specific tests out of redundant tokenizer directories 2017-01-12 02:17:18 +01:00
Ines Montani 8fecedac3a Tidy up 2017-01-12 02:16:37 +01:00
Ines Montani ae7edd30e7 Move text file back to tokenizer tests directory 2017-01-12 02:10:23 +01:00
Ines Montani ffcaba9017 Remove old and/or redundant tests 2017-01-12 02:10:18 +01:00
Ines Montani 19c4132097 Modernise space attachment parser tests and don't depend on models 2017-01-12 01:54:44 +01:00
Ines Montani 69778924c8 Modernise and merge parser tests and don't depend on models 2017-01-12 01:07:29 +01:00
Ines Montani 178c147612 Modernise nonprojectivity tests and don't depend on models 2017-01-12 01:06:36 +01:00
Ines Montani 1a3984742c Modernise sentence boundary detection tests and don't depend on models (where possible) 2017-01-11 23:53:08 +01:00
Ines Montani 0cdb6ea61d Remove old unused pickle test 2017-01-11 23:52:28 +01:00
Ines Montani c9671329dc Move test for #309 to regression tests 2017-01-11 23:52:13 +01:00
Ines Montani d0e37b5670 Modernise parser tests and don't depend on models 2017-01-11 21:30:27 +01:00
Ines Montani 342cb41782 Add apply_transition_sequence util function to utils 2017-01-11 21:30:14 +01:00
Ines Montani 09807addff Add en_parser fixture 2017-01-11 21:29:59 +01:00
Ines Montani 55d151aa61 Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 21:14:15 +01:00
Ines Montani 7262421bb2 Use consistent test names 2017-01-11 19:00:52 +01:00
Ines Montani 33800c9367 Rename "tokens" tests to "doc" 2017-01-11 18:59:01 +01:00
Ines Montani 3a9c6a9563 Remove old unused files 2017-01-11 18:58:38 +01:00
Ines Montani 8e962de39f Remove old word vector tests 2017-01-11 18:55:08 +01:00
Ines Montani e027936920 Modernise Doc noun chunks tests 2017-01-11 18:54:56 +01:00
Ines Montani 439f396acd Modernise Doc array tests and don't depend on models 2017-01-11 18:54:46 +01:00
Ines Montani 05447be884 Modernise test for adding entities 2017-01-11 18:54:24 +01:00
Ines Montani 6e883f4c00 Modernise Doc API tests and don't depend on models 2017-01-11 18:05:36 +01:00
Ines Montani 8bf3bb5c44 Make words optional for get_doc 2017-01-11 18:05:10 +01:00
Ines Montani 928db7e419 Fix StringIO import for Python 3 2017-01-11 14:07:48 +01:00
Ines Montani 69998f216b Rename test_tokens_api.py to test_doc_api.py 2017-01-11 13:58:56 +01:00
Ines Montani d94dea1b18 Merge token tests into token API tests 2017-01-11 13:57:02 +01:00
Ines Montani eb23424ab0 Modernise token API tests and don't depend on loading models 2017-01-11 13:56:54 +01:00
Ines Montani c682b8ca90 Merge conftests into one cohesive file 2017-01-11 13:56:32 +01:00
Ines Montani 909f24d7df Add test utils and get_doc helper function
Create Doc object from given vocab, words and annotations to allow
tests not to depend on loading the models.
2017-01-11 13:55:33 +01:00
Ines Montani 3e6e1f0251 Tidy up regression tests 2017-01-10 19:24:10 +01:00
Ines Montani 869963c3c4 Mark extensive prefix/suffix tests as slow 2017-01-10 15:57:35 +01:00
Ines Montani 487e020ebe Add simple test for surrounding brackets 2017-01-10 15:57:26 +01:00
Ines Montani 0ba5cf51d2 Assert length first 2017-01-10 15:57:00 +01:00