Commit Graph

351 Commits

Author SHA1 Message Date
Ines Montani ea20b72c08 💫 Make like_num work for prefixed numbers (#2808)
* Only split + prefix if not numbers

* Make like_num work for prefixed numbers

* Add test for like_num
2018-10-01 10:49:14 +02:00
Matthew Honnibal d5a6c63b62 Add regression test for #2482 2018-09-28 15:18:30 +02:00
Ines Montani 5d56eb70d7 Tidy up tests 2018-09-27 16:41:57 +02:00
Matthew Honnibal 2ac69facc6 Fix Python 2 test failure 2018-09-27 15:34:16 +02:00
Matthew Honnibal 96fe314d8d Fix bug when too many entity types. Fixes #2800 2018-09-27 13:54:34 +02:00
Matthew Honnibal 1759abf1e5 Fix bug in sentence starts for non-projective parses
The set_children_from_heads function assumed parse trees were
projective. However, non-projective parses may be passed in during
deserialization, or after deprojectivising. This caused incorrect
sentence boundaries to be set for non-projective parses. Close #2772.
2018-09-19 14:50:06 +02:00
Matthew Honnibal 48fd36bf05 Fix test for issue 27772 2018-09-19 14:47:27 +02:00
Matthew Honnibal 6cd920e088 Add xfail test for deprojectivization SBD bug 2018-09-19 14:00:31 +02:00
Matthew Honnibal e968016417 Note link between issues #2671 and #2675 2018-08-15 17:18:28 +02:00
Matthew Honnibal ce512e1d47 Fix #2671: Incorrect match ID on some patterns 2018-08-15 16:19:08 +02:00
Matthew Honnibal f12b9190f6 Xfail test for issue #2671 2018-08-15 15:55:31 +02:00
Matthew Honnibal 7cfa665ce6 Add failing test for issue 2671: Incorrect rule ID returned from matcher 2018-08-15 15:54:33 +02:00
Matthew Honnibal 4336397ecb Update develop from master 2018-08-14 03:04:28 +02:00
Matthew Honnibal 860f5bd91f Add test for issue 2626 2018-08-05 13:46:57 +02:00
Ines Montani 75f3234404
💫 Refactor test suite (#2568)
## Description

Related issues: #2379 (should be fixed by separating model tests)

* **total execution time down from > 300 seconds to under 60 seconds** 🎉
* removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure
* changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version)
* merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways)
* tidied up and rewrote existing tests wherever possible

### Todo

- [ ] move tests to `/tests` and adjust CI commands accordingly
- [x] move model test suite from internal repo to `spacy-models`
- [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~
- [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted
- [ ] update documentation on how to run tests


### Types of change
enhancement, tests

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-24 23:38:44 +02:00
Matthew Honnibal 6303ce3d0e Try to fix memory error by moving fr_tokenizer to module scope 2018-07-24 20:09:06 +02:00
ines 3c30d1763c Merge branch 'master' into develop 2018-07-21 15:34:18 +02:00
Matthew Honnibal 899f1cf442 Add regression test for issue 2179 2018-07-20 17:15:44 +02:00
ines 4a62486340 Merge branch 'master' into develop 2018-05-30 13:01:01 +02:00
Maciej c7d53348d7 Fix bug in CLI iob and ner converter (#2392) (fixes #2385)
* issue_2385 add tests for iob_to_biluo converter function

* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter

* issue_2385 add test to fix b char bug

* add contributor agreement

* fill contributor agreement
2018-05-30 12:28:44 +02:00
ines 3c3a175018 Merge branch 'master' into develop 2018-05-28 18:37:09 +02:00
ansgar-t 9732988951 escape html in displacy.render (#2378) (closes #2361)
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp;amp; , &amp;lt; , &amp;gt; , &amp;quot; in before rendering svg

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361)
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
Matthew Honnibal 546dd99cdf Merge master into develop -- mostly Arabic and website 2018-05-15 18:14:28 +02:00
Douglas Knox 9b49a40f4e Test and fix for Issue #2219 (#2272)
Test and fix for Issue #2219: Token.similarity() failed if single letter
2018-05-03 18:40:46 +02:00
Matthew Honnibal 2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani 98e9cda677
Merge pull request #2158 from explosion/feature/fix-multiple-vectors (resolves #1660)
💫 Fix loading of multiple vector models
2018-03-28 23:08:24 +02:00
ines 3eb67bbe4b Allow entity types with dashes (resolves #1967) 2018-03-28 20:51:26 +02:00
Matthew Honnibal fd9e259414 Add test for #1660 2018-03-28 18:22:51 +02:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
Aaron Marquez f0d3672e17 Changed loading EN model 2018-02-15 14:28:38 -08:00
Aaron Marquez 7ba4111554 Add test for issue-1959 2018-02-15 12:46:22 -08:00
Matthew Honnibal fd9fd275c5 Make test for #1945 more precise 2018-02-07 02:06:11 +01:00
Matthew Honnibal c087a14380 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-07 01:29:39 +01:00
Matthew Honnibal 76d89b2180 Add test for #1945: PhraseMatcher regression 2018-02-07 01:29:23 +01:00
Matthew Honnibal 2e7391e627
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
Bug/fix not passing in model cfg in nlp
2018-02-05 01:19:40 +01:00
Matthew Honnibal f74a802d09 Test and fix #1919: Error resuming training 2018-02-02 02:32:40 +01:00
Motoki Wu 54062b7326 added tests for issue #1915 2018-01-30 18:30:19 -08:00
Thomas Opsomer 45d62561f7 add test for the issue 2018-01-28 19:49:56 +01:00
Matthew Honnibal 6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal edb71a280e Add test for #1883: Unpickling Matcher 2018-01-24 15:42:33 +01:00
Matthew Honnibal 42a18ef903 Add test for #1868: Vocab.__contains__ with ints 2018-01-23 23:27:05 +01:00
greg 85ab99e692 Correct test examples 2018-01-23 15:00:14 -05:00
Matthew Honnibal 91e916cb67 Add comment to new test 2018-01-23 19:11:53 +01:00
Matthew Honnibal fd187d71ad Add test for #1727 2018-01-23 19:11:01 +01:00
Matthew Honnibal 7e6dc283db Fix unicode import in test 2018-01-22 23:55:44 +01:00
greg 686735b94e Fix matcher import 2018-01-22 16:53:05 -05:00
Matthew Honnibal 4ce7d24fd5 Add test for #1799: Set left and right edges (and thus sentences) in non-projective parses. 2018-01-22 20:18:38 +01:00
greg 7072b395c9 Add greedy matcher tests 2018-01-16 15:46:13 -05:00
Matthew Honnibal 82135d85b7 Fix test 2018-01-15 15:55:15 +01:00
Matthew Honnibal 4b09616b58 Add test for #1757: Comparison against None 2018-01-15 15:55:01 +01:00
Matthew Honnibal 9e413449f6 Fix unicode error in new test 2018-01-15 15:39:00 +01:00
Matthew Honnibal 6b215d2dd3 Add test for Issue #1537 2018-01-15 15:20:56 +01:00
Matthew Honnibal 1a1cca6052 Fix vectors.resize() on Py3. Closes #1539 2018-01-14 14:48:51 +01:00
Matthew Honnibal 0153220304 Make set_vector add word to vocab. Fixes #1807 2018-01-14 13:57:57 +01:00
Ines Montani 55754f0cee
Merge pull request #1836 from fucking-signup/master
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit 4ee97f20a0
Mark like_num tests as slow 2018-01-13 00:44:15 +01:00
Kit 855531537e
Rewrite tests for issue #1769 2018-01-12 23:49:51 +01:00
Kit 5b541cb5ec
Simplify tests for issue #1769 2018-01-12 23:34:27 +01:00
Kit 7a2adc4633
Remove some tests to see build status changes 2018-01-12 22:49:16 +01:00
Kit 0e62809a43
Rewrite tests for issue #1769 2018-01-12 22:26:06 +01:00
Ines Montani 36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit 76f4eeca44
Remove tests to see build changes on Windows (Python 2.7) 2018-01-12 20:30:51 +01:00
Kit 7ec0956e8d
Add regression test (issue #1769) 2018-01-08 03:42:04 +01:00
Søren Lind Kristiansen 62de5da1ff Remove unsused dummy variable 2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen 10dab8eef8 Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
Kevin Humphreys 597df5bf83 add test 2018-01-03 13:00:05 -08:00
ines 26f313dabc Fix missing import 2017-12-22 16:21:44 +01:00
ines b10ba848b8 xfail test that causes MemoryError on Python 2 on Windows
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Ines Montani 9c1ee65268
Add regression test for #1698 2017-12-12 10:36:11 +01:00
Isaac Sijaranamual 38021fbb00 Switch from python 3 only TemporaryDirectory to pytest's tmpdir 2017-12-11 00:16:04 +01:00
Isaac Sijaranamual 568130ce7c Adds regression test_issue1622 2017-12-10 23:00:48 +01:00
Matthew Honnibal 6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
ines a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
ines b62739fbfe Add regression test for #1654 2017-11-28 20:27:54 +01:00
ines 2e50dbb9d7 Simplify test 2017-11-28 20:27:27 +01:00
Felix Sonntag 724ae7dc55 Fixed issue of infix capturing prefixes 2017-11-28 17:17:12 +01:00
Matthew Honnibal 30ba81f881
Merge pull request #1576 from ligser/master
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
Burton DeWilde 635792997c Add regression test for #1612 2017-11-20 12:05:35 -06:00
ines d70a64d78b Fix syntax error and formatting in test (see #1617) 2017-11-20 14:01:25 +01:00
Felix Sonntag 8be3392302 Added regression text for 1494 2017-11-19 16:30:35 +01:00
Motoki Wu b818afaa0e Added failing test for Issue #1207.
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
Roman Domrachev 505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev 3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev 4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman 47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman Domrachev 3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Roman Domrachev ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev 3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines 2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
Matthew Honnibal a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal 2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal 407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal 102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
ines 5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines 718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
ines eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00
Matthew Honnibal 094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal 908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal 63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
Matthew Honnibal 4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal 391d5ef0d1 Normalize imports in regression test 2017-10-24 14:25:49 +02:00
Matthew Honnibal b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal a68d89a4f3 Add failing test for bug #1375 -- no out-of-bounds error for token.nbor() 2017-10-24 12:05:25 +02:00
Matthew Honnibal 490ad3eaf0 Check that empty strings are handled. Closes #1242 2017-10-21 00:52:14 +02:00
Matthew Honnibal d8391b1c4d Fix #1434: Matcher failed on ending ? if no token 2017-10-20 16:49:36 +02:00
Matthew Honnibal f111b228e0 Fix re-parsing of previously parsed text
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:

1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.

This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.

Closes #1253.
2017-10-20 16:27:36 +02:00
ines 3516aa0cea Port over changes from #1389 2017-10-14 13:32:55 +02:00
ines 15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
Matthew Honnibal c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal 40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal 456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal 99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal 497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal 5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00
ines 51d7414e94 Make sure sents are a list 2017-06-05 12:30:13 +02:00
ines a0f4592f0a Update tests 2017-06-05 02:26:13 +02:00
ines 3e105bcd36 Update tests 2017-06-05 02:09:27 +02:00
Matthew Honnibal bb98d45a63 Fix tests 2017-06-04 16:00:44 -05:00
Matthew Honnibal 55d0621532 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 15:53:25 -05:00
Matthew Honnibal 5b9f116aca Update tests 2017-06-04 15:53:17 -05:00
ines 8a29308d0b Remove unused imports 2017-06-04 22:39:29 +02:00
ines 96867a24ae Fix typo 2017-06-04 22:36:40 +02:00
ines 20a7003c0d Update model fixtures and reorganise tests 2017-05-29 22:14:31 +02:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
ines fb0ff0272f xfail neural parser tests for now and remove test for deprecated method 2017-05-23 12:40:37 +02:00
Matthew Honnibal 5418bcf5d7 Resolve conflict on test 2017-05-23 04:37:16 -05:00
ines e6acd3bbf2 Fix matcher tests and matcher docs 2017-05-23 11:36:02 +02:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
ines b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Matthew Honnibal 8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
ines 3c0f85de8e Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
ines be5541bd16 Fix import and tokenizer exceptions 2017-05-08 16:20:14 +02:00
Matthew Honnibal 24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Matthew Honnibal c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal 65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00
Matthew Honnibal 70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal 3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
Matthew Honnibal 874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal 5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal 040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
Matthew Honnibal 1dca7eeb03 Add unicode declaration on new regression test 2017-04-07 18:09:23 +02:00