Commit Graph

59 Commits

Author SHA1 Message Date
Ines Montani 7ba3a5d95c 💫 Make serialization methods consistent (#3385)
* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Matthew Honnibal 27dd820753
Fix vocab deserialization when loading already present lexemes (#3383)
* Fix vocab deserialization bug. Closes #2153

* Un-xfail test for #2153
2019-03-10 17:21:19 +01:00
Matthew Honnibal 61e5ce02a4 Add xfailing test for #2153 2019-03-10 16:36:29 +01:00
Ines Montani 323fc26880 Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
Ines Montani b6e991440c 💫 Tidy up and auto-format tests (#2967)
* Auto-format tests with black

* Add flake8 config

* Tidy up and remove unused imports

* Fix redefinitions of test functions

* Replace orths_and_spaces with words and spaces

* Fix compatibility with pytest 4.0

* xfail test for now

Test was previously overwritten by following test due to naming conflict, so failure wasn't reported

* Unfail passing test

* Only use fixture via arguments

Fixes pytest 4.0 compatibility
2018-11-27 01:09:36 +01:00
Ines Montani 75f3234404
💫 Refactor test suite (#2568)
## Description

Related issues: #2379 (should be fixed by separating model tests)

* **total execution time down from > 300 seconds to under 60 seconds** 🎉
* removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure
* changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version)
* merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways)
* tidied up and rewrote existing tests wherever possible

### Todo

- [ ] move tests to `/tests` and adjust CI commands accordingly
- [x] move model test suite from internal repo to `spacy-models`
- [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~
- [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted
- [ ] update documentation on how to run tests


### Types of change
enhancement, tests

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-24 23:38:44 +02:00
ines 38e07ade4c Add test for custom tokenizer serialization (resolves #2494) 2018-07-06 12:40:51 +02:00
ines c2581f9172 Tidy up tokenizer test 2018-07-06 12:40:28 +02:00
ines 526be40823 Add test for 46d8a66 2018-06-29 14:33:12 +02:00
Matthew Honnibal cf5fcf0546 Update serialization test 2018-03-28 20:12:53 +02:00
Claudiu-Vlad Ursache e28de12cbd
Ensure files opened in `from_disk` are closed
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).
2018-02-13 20:49:43 +01:00
ines 1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal 4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
Matthew Honnibal b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines bf415fd778 Add test for serializing extension attrs (see #1085) 2017-10-19 00:53:08 +02:00
Matthew Honnibal f0f2739ae3 Add test for serialization issue raised in #1105 2017-10-10 03:57:58 +02:00
Matthew Honnibal 74f08e1ad5 Update test 2017-09-26 06:45:56 -05:00
Matthew Honnibal 42d47c1e5c Fix tagger serialization 2017-08-19 04:16:32 +02:00
Matthew Honnibal a7309a217d Update tagger serialization 2017-08-18 23:12:05 +02:00
ines a66cf24ee8 xfail tokenizer serialization tests for now
Tests pass locally, but not on Travis – needs more investigation
2017-06-04 13:58:20 +02:00
ines 3152ee5ca2 Update serialization tests for tokenizer 2017-06-03 17:05:28 +02:00
ines de974f7bef Add serializer tests for tokenizer 2017-06-03 13:26:34 +02:00
ines d21459f87d Update serializer tests 2017-06-02 21:42:26 +02:00
ines d86e7cde93 Add entity recognizer to parser serialization tests 2017-06-02 18:40:06 +02:00
ines 0051c05964 Add tests for serializing parser 2017-06-02 18:37:19 +02:00
ines cef547a9f0 Add serialization tests for tensorizer 2017-06-02 18:18:30 +02:00
ines f74a45c1fe Remove unnecessary argument 2017-06-02 18:17:46 +02:00
ines 43b4d63f85 Add serialization tests for tagger 2017-06-02 17:29:34 +02:00
ines acd65c00f6 Add serialization tests for StringStore and Vocab 2017-06-02 10:57:42 +02:00
ines 7b1ddcc04d Add test for vocab serialization 2017-05-29 01:09:52 +02:00
Matthew Honnibal 7253b4e649 Remove old serialization tests 2017-05-09 18:12:58 +02:00
Matthew Honnibal f9327343ce Start updating serializer test 2017-05-09 18:12:03 +02:00
Ines Montani 38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00
Ines Montani 61f1ca09c2 Modernise serializer codecs tests 2017-01-12 21:58:55 +01:00
Ines Montani 5dbc6e59f6 Modernise Huffman tests 2017-01-12 21:58:40 +01:00
Ines Montani edeeeccea5 Modernise packer tests and don't depend on models where possible 2017-01-12 21:58:07 +01:00
Ines Montani d084676cd0 Modernise and merge serialization tests 2017-01-12 21:57:19 +01:00
Matthew Honnibal 79aa03fe98 Test Issue #514: Serializer fails when new entity type has been added. 2016-10-23 17:41:44 +02:00
Matthew Honnibal 4de30a8e38 Test Issue #514: Serialization fails after adding a new entity label. 2016-10-23 16:40:27 +02:00
Matthew Honnibal e99b3f5322 Test Issue #459: Fail to deserialize empty doc 2016-10-23 16:30:22 +02:00
Matthew Honnibal 99ff8b902f Test that huffman codec works with empty freqs dict 2016-10-23 16:27:45 +02:00
Matthew Honnibal c3a8a1cf51 Update serializer test. 2016-10-18 16:18:46 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal c1c11a8ae0 * Fix formatting on serializer tests 2016-05-02 16:07:21 +02:00
Wolfgang Seeker fa961ea694 add tests for serialization bug 2016-05-02 11:01:56 +02:00
Henning Peters a473d6e937 fix tests (use english model) 2016-04-12 16:41:57 +02:00
Henning Peters c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Henning Peters 3b5f1e753b py26 compatibility 2016-02-10 14:32:54 +01:00
Henning Peters 235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Matthew Honnibal aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00