spaCy/.gitignore

# spaCy
spacy/data/
corpora/
/models/
keys/
*.json.gz

# Tests
spacy/tests/package/setup.cfg
spacy/tests/package/pyproject.toml
spacy/tests/package/requirements.txt

# Cython / C extensions
cythonize.json
spacy/*.html
*.cpp
*.c
*.so

# Vim / VSCode / editors
*.swp
*.sw*
Profile.prof
.vscode
.sass-cache

# Python
.Python
.python-version
__pycache__/
.pytest_cache
*.py[cod]
.env/
.env*
.~env/
.venv
env3.6/
venv/
env3.*/
.dev
.denv
.pypyenv
.pytest_cache/
.mypy_cache/
.hypothesis/

# Distribution / packaging
env/
build/
develop-eggs/
dist/
eggs/
lib/
lib64/
parts/
sdist/
var/
wheelhouse/
*.egg-info/
pip-wheel-metadata/
Pipfile.lock
.installed.cfg
*.egg
.eggs
MANIFEST
spacy/git_info.py

# Temporary files
*.~*
tmp/

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml

# Translations
*.mo

# Mr Developer
.mr.developer.cfg
.project
.pydevproject

# Rope
.ropeproject

# Django stuff:
*.log
*.pot

# Windows
*.bat
Thumbs.db
Desktop.ini

# Mac OS X
*.DS_Store

# Komodo project files
*.komodoproject

# Other
*.tgz

# Pycharm project files
*.idea

# IPython
.ipynb_checkpoints/
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`# spaCy`
			`spacy/data/`
* Upd gitignore 2015-04-08 05:48:04 +00:00			`corpora/`
Add models documentation 2017-10-03 12:28:03 +00:00			`/models/`
* Ignore keys and other things 2015-08-22 20:12:07 +00:00			`keys/`
Update .gitignore [ci skip] 2019-08-19 09:54:42 +00:00			`*.json.gz`
* Upd gitignore 2015-04-08 05:48:04 +00:00
Port over gitignore changes from develop Prevents stale files when switching branches 2020-03-09 10:05:00 +00:00			`# Tests`
			`spacy/tests/package/setup.cfg`
			`spacy/tests/package/pyproject.toml`
			`spacy/tests/package/requirements.txt`

Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`# Cython / C extensions`
Add cythonize.json to .gitignore This gets generated for me when installing from the local repo with pip using `sudo pip3 install -e .` from within the spaCy folder. I figure it should be ignored. 2016-11-20 13:55:52 +00:00			`cythonize.json`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`spacy/*.html`
			`*.cpp`
Add levenshtein from polyleven (#11418) Add a simple levenshtein distance function using the implementation from the polyleven library as `spacy.matcher.levenshtein`. 2022-09-14 15:05:22 +00:00			`*.c`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`*.so`
* Add gitignore 2014-07-05 18:50:01 +00:00
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`# Vim / VSCode / editors`
			`*.swp`
			`.sw`
			`Profile.prof`
			`.vscode`
			`.sass-cache`

			`# Python`
			`.Python`
			`.python-version`
Initial commit 2014-07-03 15:15:40 +00:00			`__pycache__/`
Ignore pytest cache 2018-07-19 10:30:09 +00:00			`.pytest_cache`
Initial commit 2014-07-03 15:15:40 +00:00			`*.py[cod]`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`.env/`
Add more variations of .env to gitignore 2017-06-02 19:08:39 +00:00			`.env*`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`.~env/`
			`.venv`
Update spaCy for thinc 8.0.0 (#4920) * Add load_from_config function * Add train_from_config script * Merge configs and expose via spacy.config * Fix script * Suggest create_evaluation_callback * Hard-code for NER * Fix errors * Register command * Add TODO * Update train-from-config todos * Fix imports * Allow delayed setting of parser model nr_class * Get train-from-config working * Tidy up and fix scores and printing * Hide traceback if cancelled * Fix weighted score formatting * Fix score formatting * Make output_path optional * Add Tok2Vec component * Tidy up and add tok2vec_tensors * Add option to copy docs in nlp.update * Copy docs in nlp.update * Adjust nlp.update() for set_annotations * Don't shuffle pipes in nlp.update, decruft * Support set_annotations arg in component update * Support set_annotations in parser update * Add get_gradients method * Add get_gradients to parser * Update errors.py * Fix problems caused by merge * Add _link_components method in nlp * Add concept of 'listeners' and ControlledModel * Support optional attributes arg in ControlledModel * Try having tok2vec component in pipeline * Fix tok2vec component * Fix config * Fix tok2vec * Update for Example * Update for Example * Update config * Add eg2doc util * Update and add schemas/types * Update schemas * Fix nlp.update * Fix tagger * Remove hacks from train-from-config * Remove hard-coded config str * Calculate loss in tok2vec component * Tidy up and use function signatures instead of models * Support union types for registry models * Minor cleaning in Language.update * Make ControlledModel specifically Tok2VecListener * Fix train_from_config * Fix tok2vec * Tidy up * Add function for bilstm tok2vec * Fix type * Fix syntax * Fix pytorch optimizer * Add example configs * Update for thinc describe changes * Update for Thinc changes * Update for dropout/sgd changes * Update for dropout/sgd changes * Unhack gradient update * Work on refactoring _ml * Remove _ml.py module * WIP upgrade cli scripts for thinc * Move some _ml stuff to util * Import link_vectors from util * Update train_from_config * Import from util * Import from util * Temporarily add ml.component_models module * Move ml methods * Move typedefs * Update load vectors * Update gitignore * Move imports * Add PrecomputableAffine * Fix imports * Fix imports * Fix imports * Fix missing imports * Update CLI scripts * Update spacy.language * Add stubs for building the models * Update model definition * Update create_default_optimizer * Fix import * Fix comment * Update imports in tests * Update imports in spacy.cli * Fix import * fix obsolete thinc imports * update srsly pin * from thinc to ml_datasets for example data such as imdb * update ml_datasets pin * using STATE.vectors * small fix * fix Sentencizer.pipe * black formatting * rename Affine to Linear as in thinc * set validate explicitely to True * rename with_square_sequences to with_list2padded * rename with_flatten to with_list2array * chaining layernorm * small fixes * revert Optimizer import * build_nel_encoder with new thinc style * fixes using model's get and set methods * Tok2Vec in component models, various fixes * fix up legacy tok2vec code * add model initialize calls * add in build_tagger_model * small fixes * setting model dims * fixes for ParserModel * various small fixes * initialize thinc Models * fixes * consistent naming of window_size * fixes, removing set_dropout * work around Iterable issue * remove legacy tok2vec * util fix * fix forward function of tok2vec listener * more fixes * trying to fix PrecomputableAffine (not succesful yet) * alloc instead of allocate * add morphologizer * rename residual * rename fixes * Fix predict function * Update parser and parser model * fixing few more tests * Fix precomputable affine * Update component model * Update parser model * Move backprop padding to own function, for test * Update test * Fix p. affine * Update NEL * build_bow_text_classifier and extract_ngrams * Fix parser init * Fix test add label * add build_simple_cnn_text_classifier * Fix parser init * Set gpu off by default in example * Fix tok2vec listener * Fix parser model * Small fixes * small fix for PyTorchLSTM parameters * revert my_compounding hack (iterable fixed now) * fix biLSTM * Fix uniqued * PyTorchRNNWrapper fix * small fixes * use helper function to calculate cosine loss * small fixes for build_simple_cnn_text_classifier * putting dropout default at 0.0 to ensure the layer gets built * using thinc util's set_dropout_rate * moving layer normalization inside of maxout definition to optimize dropout * temp debugging in NEL * fixed NEL model by using init defaults ! * fixing after set_dropout_rate refactor * proper fix * fix test_update_doc after refactoring optimizers in thinc * Add CharacterEmbed layer * Construct tagger Model * Add missing import * Remove unused stuff * Work on textcat * fix test (again :)) after optimizer refactor * fixes to allow reading Tagger from_disk without overwriting dimensions * don't build the tok2vec prematuraly * fix CharachterEmbed init * CharacterEmbed fixes * Fix CharacterEmbed architecture * fix imports * renames from latest thinc update * one more rename * add initialize calls where appropriate * fix parser initialization * Update Thinc version * Fix errors, auto-format and tidy up imports * Fix validation * fix if bias is cupy array * revert for now * ensure it's a numpy array before running bp in ParserStepModel * no reason to call require_gpu twice * use CupyOps.to_numpy instead of cupy directly * fix initialize of ParserModel * remove unnecessary import * fixes for CosineDistance * fix device renaming * use refactored loss functions (Thinc PR 251) * overfitting test for tagger * experimental settings for the tagger: avoid zero-init and subword normalization * clean up tagger overfitting test * use previous default value for nP * remove toy config * bringing layernorm back (had a bug - fixed in thinc) * revert setting nP explicitly * remove setting default in constructor * restore values as they used to be * add overfitting test for NER * add overfitting test for dep parser * add overfitting test for textcat * fixing init for linear (previously affine) * larger eps window for textcat * ensure doc is not None * Require newer thinc * Make float check vaguer * Slop the textcat overfit test more * Fix textcat test * Fix exclusive classes for textcat * fix after renaming of alloc methods * fixing renames and mandatory arguments (staticvectors WIP) * upgrade to thinc==8.0.0.dev3 * refer to vocab.vectors directly instead of its name * rename alpha to learn_rate * adding hashembed and staticvectors dropout * upgrade to thinc 8.0.0.dev4 * add name back to avoid warning W020 * thinc dev4 * update srsly * using thinc 8.0.0a0 ! Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> Co-authored-by: Ines Montani <ines@ines.io> 2020-01-29 16:06:46 +00:00			`env3.6/`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`venv/`
Improve Makefile (#5067) * Improve pex making * Update gitignore 2020-02-26 19:59:10 +00:00			`env3.*/`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`.dev`
			`.denv`
			`.pypyenv`
💫 Refactor test suite (#2568) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. 2018-07-24 21:38:44 +00:00			`.pytest_cache/`
Tidy up, autoformat, add types 2020-07-25 13:01:15 +00:00			`.mypy_cache/`
Tidy up and auto-format 2021-01-05 02:41:53 +00:00			`.hypothesis/`
Initial commit 2014-07-03 15:15:40 +00:00
			`# Distribution / packaging`
			`env/`
			`build/`
			`develop-eggs/`
			`dist/`
			`eggs/`
			`lib/`
			`lib64/`
			`parts/`
			`sdist/`
			`var/`
Improve Makefile (#5067) * Improve pex making * Update gitignore 2020-02-26 19:59:10 +00:00			`wheelhouse/`
Initial commit 2014-07-03 15:15:40 +00:00			`*.egg-info/`
Korean support (#3901) * start lang/ko * add test codes * using natto-py * add test_ko_tokenizer_full_tags() * spaCy contributor agreement * external dependency for ko * collections.namedtuple for python version < 3.5 * case fix * tuple unpacking * add jongseong(final consonant) * apply mecab option * Remove Pipfile for now Co-authored-by: Ines Montani <ines@ines.io> 2019-07-09 20:23:16 +00:00			`pip-wheel-metadata/`
			`Pipfile.lock`
Initial commit 2014-07-03 15:15:40 +00:00			`.installed.cfg`
			`*.egg`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`.eggs`
			`MANIFEST`
Include git commit in package and model meta (#5694) * Include git commit in package and model meta * Rewrite to read file in setup * Fix file handle 2020-07-02 15:10:27 +00:00			`spacy/git_info.py`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00
			`# Temporary files`
			`.~`
			`tmp/`
Initial commit 2014-07-03 15:15:40 +00:00
			`# Installer logs`
			`pip-log.txt`
			`pip-delete-this-directory.txt`

			`# Unit test / coverage reports`
			`htmlcov/`
			`.tox/`
			`.coverage`
			`.cache`
			`nosetests.xml`
			`coverage.xml`

			`# Translations`
			`*.mo`

			`# Mr Developer`
			`.mr.developer.cfg`
			`.project`
			`.pydevproject`

			`# Rope`
			`.ropeproject`

			`# Django stuff:`
			`*.log`
			`*.pot`

Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`# Windows`
Added Windows file to .gitignore 2015-10-13 07:58:30 +00:00			`*.bat`
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`Thumbs.db`
			`Desktop.ini`
Added Windows file to .gitignore 2015-10-13 07:58:30 +00:00
Added reloadable English() example for inv. count 2016-03-10 02:44:33 +00:00			`# Mac OS X`
			`*.DS_Store`

Added Windows file to .gitignore 2015-10-13 07:58:30 +00:00			`# Komodo project files`
Added reloadable English() example for inv. count 2016-03-10 02:44:33 +00:00			`*.komodoproject`
Update gitignore 2016-10-03 18:19:05 +00:00
Tidy up .gitignore 2017-05-18 11:51:31 +00:00			`# Other`
			`*.tgz`
Ignore pycharm project files 2017-11-13 16:45:04 +00:00
			`# Pycharm project files`
			`*.idea`
Update spaCy for thinc 8.0.0 (#4920) * Add load_from_config function * Add train_from_config script * Merge configs and expose via spacy.config * Fix script * Suggest create_evaluation_callback * Hard-code for NER * Fix errors * Register command * Add TODO * Update train-from-config todos * Fix imports * Allow delayed setting of parser model nr_class * Get train-from-config working * Tidy up and fix scores and printing * Hide traceback if cancelled * Fix weighted score formatting * Fix score formatting * Make output_path optional * Add Tok2Vec component * Tidy up and add tok2vec_tensors * Add option to copy docs in nlp.update * Copy docs in nlp.update * Adjust nlp.update() for set_annotations * Don't shuffle pipes in nlp.update, decruft * Support set_annotations arg in component update * Support set_annotations in parser update * Add get_gradients method * Add get_gradients to parser * Update errors.py * Fix problems caused by merge * Add _link_components method in nlp * Add concept of 'listeners' and ControlledModel * Support optional attributes arg in ControlledModel * Try having tok2vec component in pipeline * Fix tok2vec component * Fix config * Fix tok2vec * Update for Example * Update for Example * Update config * Add eg2doc util * Update and add schemas/types * Update schemas * Fix nlp.update * Fix tagger * Remove hacks from train-from-config * Remove hard-coded config str * Calculate loss in tok2vec component * Tidy up and use function signatures instead of models * Support union types for registry models * Minor cleaning in Language.update * Make ControlledModel specifically Tok2VecListener * Fix train_from_config * Fix tok2vec * Tidy up * Add function for bilstm tok2vec * Fix type * Fix syntax * Fix pytorch optimizer * Add example configs * Update for thinc describe changes * Update for Thinc changes * Update for dropout/sgd changes * Update for dropout/sgd changes * Unhack gradient update * Work on refactoring _ml * Remove _ml.py module * WIP upgrade cli scripts for thinc * Move some _ml stuff to util * Import link_vectors from util * Update train_from_config * Import from util * Import from util * Temporarily add ml.component_models module * Move ml methods * Move typedefs * Update load vectors * Update gitignore * Move imports * Add PrecomputableAffine * Fix imports * Fix imports * Fix imports * Fix missing imports * Update CLI scripts * Update spacy.language * Add stubs for building the models * Update model definition * Update create_default_optimizer * Fix import * Fix comment * Update imports in tests * Update imports in spacy.cli * Fix import * fix obsolete thinc imports * update srsly pin * from thinc to ml_datasets for example data such as imdb * update ml_datasets pin * using STATE.vectors * small fix * fix Sentencizer.pipe * black formatting * rename Affine to Linear as in thinc * set validate explicitely to True * rename with_square_sequences to with_list2padded * rename with_flatten to with_list2array * chaining layernorm * small fixes * revert Optimizer import * build_nel_encoder with new thinc style * fixes using model's get and set methods * Tok2Vec in component models, various fixes * fix up legacy tok2vec code * add model initialize calls * add in build_tagger_model * small fixes * setting model dims * fixes for ParserModel * various small fixes * initialize thinc Models * fixes * consistent naming of window_size * fixes, removing set_dropout * work around Iterable issue * remove legacy tok2vec * util fix * fix forward function of tok2vec listener * more fixes * trying to fix PrecomputableAffine (not succesful yet) * alloc instead of allocate * add morphologizer * rename residual * rename fixes * Fix predict function * Update parser and parser model * fixing few more tests * Fix precomputable affine * Update component model * Update parser model * Move backprop padding to own function, for test * Update test * Fix p. affine * Update NEL * build_bow_text_classifier and extract_ngrams * Fix parser init * Fix test add label * add build_simple_cnn_text_classifier * Fix parser init * Set gpu off by default in example * Fix tok2vec listener * Fix parser model * Small fixes * small fix for PyTorchLSTM parameters * revert my_compounding hack (iterable fixed now) * fix biLSTM * Fix uniqued * PyTorchRNNWrapper fix * small fixes * use helper function to calculate cosine loss * small fixes for build_simple_cnn_text_classifier * putting dropout default at 0.0 to ensure the layer gets built * using thinc util's set_dropout_rate * moving layer normalization inside of maxout definition to optimize dropout * temp debugging in NEL * fixed NEL model by using init defaults ! * fixing after set_dropout_rate refactor * proper fix * fix test_update_doc after refactoring optimizers in thinc * Add CharacterEmbed layer * Construct tagger Model * Add missing import * Remove unused stuff * Work on textcat * fix test (again :)) after optimizer refactor * fixes to allow reading Tagger from_disk without overwriting dimensions * don't build the tok2vec prematuraly * fix CharachterEmbed init * CharacterEmbed fixes * Fix CharacterEmbed architecture * fix imports * renames from latest thinc update * one more rename * add initialize calls where appropriate * fix parser initialization * Update Thinc version * Fix errors, auto-format and tidy up imports * Fix validation * fix if bias is cupy array * revert for now * ensure it's a numpy array before running bp in ParserStepModel * no reason to call require_gpu twice * use CupyOps.to_numpy instead of cupy directly * fix initialize of ParserModel * remove unnecessary import * fixes for CosineDistance * fix device renaming * use refactored loss functions (Thinc PR 251) * overfitting test for tagger * experimental settings for the tagger: avoid zero-init and subword normalization * clean up tagger overfitting test * use previous default value for nP * remove toy config * bringing layernorm back (had a bug - fixed in thinc) * revert setting nP explicitly * remove setting default in constructor * restore values as they used to be * add overfitting test for NER * add overfitting test for dep parser * add overfitting test for textcat * fixing init for linear (previously affine) * larger eps window for textcat * ensure doc is not None * Require newer thinc * Make float check vaguer * Slop the textcat overfit test more * Fix textcat test * Fix exclusive classes for textcat * fix after renaming of alloc methods * fixing renames and mandatory arguments (staticvectors WIP) * upgrade to thinc==8.0.0.dev3 * refer to vocab.vectors directly instead of its name * rename alpha to learn_rate * adding hashembed and staticvectors dropout * upgrade to thinc 8.0.0.dev4 * add name back to avoid warning W020 * thinc dev4 * update srsly * using thinc 8.0.0a0 ! Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> Co-authored-by: Ines Montani <ines@ines.io> 2020-01-29 16:06:46 +00:00
			`# IPython`
			`.ipynb_checkpoints/`