Commit Graph

1095 Commits

Author SHA1 Message Date
Sofie Van Landeghem 02a6a5fea0
Fix 'debug model' for transformers + generalize (#7973)
* add overrides to docs

* fix debug model with transformer

* assume training data is set in config
2021-05-06 18:43:32 +10:00
Paul O'Leary McCann 8007d5c814
Check if the resume path points to a directory (#7919)
This came up in #7878, but if --resume-path is a directory then loading
the weights will fail. On Linux this will give a straightforward error
message, but on Windows it gives "Permission Denied", which is
confusing.
2021-04-28 09:17:15 +02:00
Paul O'Leary McCann de6b5ed14d
Fix percent unk display in debug data (#7886)
* Fix percent unk display

This was showing (ratio %), so 10% would show as 0.10%. Fix by
multiplying ration by 100.

Might want to add a warning if this is over a threshold.

* Only show whole-integer percents
2021-04-27 09:16:35 +02:00
Adriane Boyd d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Sofie Van Landeghem c786e98e56
assemble CLI command (#7783)
* assemble CLI command

* ensure assemble runs even without training section

* cleanup
2021-04-19 18:39:11 +10:00
Bram Vanroy ed561cf428
Terminology: deprecated vs obsolete (#7621)
* Terminology: deprecated vs obsolete

Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works.

In light of this, perhaps all other error codes should be checked as well.

* clarify that the link command is removed and not just deprecated

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-12 14:37:00 +02:00
Adriane Boyd 73a8c0f992
Update debug data further for v3 (#7602)
* Update debug data further for v3

* Remove new/existing label distinction (new labels are not immediately
distinguishable because the pipeline is already initialized)
* Warn on missing labels in training data for all components except parser
* Separate textcat and textcat_multilabel sections
* Add section for morphologizer

* Reword missing label warnings
2021-04-09 11:53:42 +02:00
Adriane Boyd 03e9e7b567 Add --code option to init fill-config 2021-03-12 10:03:57 +01:00
Adriane Boyd ce6317231f Add --code to spacy debug CLI 2021-03-12 09:51:26 +01:00
Sofie Van Landeghem 932887b950
textcat scoring fix and multi_label docs (#6974)
* add multi-label textcat to menu

* add infobox on textcat API

* add info to v3 migration guide

* small edits

* further fixes in doc strings

* add infobox to textcat architectures

* add textcat_multilabel to overview of built-in components

* spelling

* fix unrelated warn msg

* Add textcat_multilabel to quickstart [ci skip]

* remove separate documentation page for multilabel_textcategorizer

* small edits

* positive label clarification

* avoid duplicating information in self.cfg and fix textcat.score

* fix multilabel textcat too

* revert threshold to storage in cfg

* revert threshold stuff for multi-textcat

Co-authored-by: Ines Montani <ines@ines.io>
2021-03-09 23:04:22 +11:00
Ines Montani ea555b03e0
Merge pull request #7255 from adrianeboyd/bugfix/extraneous-tok2vec
Omit unused tok2vec/transformer components
2021-03-03 23:15:06 +11:00
Adriane Boyd 8a4200d4e9 Omit unused tok2vec/transformer components
Omit unused tok2vec/transformer components in quickstart template.
2021-03-02 15:53:30 +01:00
Adriane Boyd fb98862337
Add hint for --gpu-id to CLI device info (#7234)
* Add hint for --gpu-id to CLI device info

If the user has `cupy` and an available GPU, add a hint about using
`--gpu-id 0` to the CLI output.

* Undo change to original CPU message
2021-03-03 01:11:18 +11:00
Adriane Boyd ee7bb0b393 Fix formatting in bg/bn quickstart recs 2021-02-26 17:08:37 +01:00
Adriane Boyd 30e1a89aeb
Fix displacy output in evaluate CLI (#7122)
Now that `nlp.evaluate()` does not modify the examples, rerun the
pipeline on the (limited) texts in order to provide the predicted
annotation in the displacy output option.
2021-02-19 23:01:20 +11:00
Adriane Boyd 4188beda87
Fix conll converter option (#7071)
Map `conll` to the NER converter, not the `CoNLL-U` converter.
2021-02-18 10:22:41 +01:00
Ines Montani 1e3a326e53 Change Dutch transformer recommendation [ci skip]
https://github.com/explosion/spaCy/discussions/6529#discussioncomment-366620
2021-02-14 15:30:16 +11:00
Ines Montani f4f46b617f
Preserve sourced components in fill-config (fixes #7055) (#7058) 2021-02-14 14:02:14 +11:00
Adriane Boyd 0ee2ae86bf Update trf quickstart recommendations
Add/update trf recommendations for Bengali, Hindi, Sinhala, and Tamil
based on #7044.
2021-02-12 15:55:17 +01:00
Ines Montani 26bf642afd
Fix issue #7019: Handle None scores in evaluate printer (#7026) 2021-02-11 16:45:23 +11:00
Ines Montani c08b3f294c Support env vars and CLI overrides for project.yml 2021-02-10 13:45:27 +11:00
svlandeg f852af2acf add capture arg 2021-02-02 19:47:12 +01:00
Sofie Van Landeghem f319d2765f
Add capture argument to project_run (#6878)
* add capture argument to project_run and run_commands

* git bump to 3.0.1

* Set version to 3.0.1.dev0

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-02-02 10:11:15 +08:00
Ines Montani a59f3fcf5d Make wheel the default format and update docs [ci skip] 2021-02-01 23:18:43 +11:00
Ines Montani b9573e9e22 Fix pip args 2021-02-01 23:15:00 +11:00
Ines Montani b46073234a Fix default clone branch and error handling [ci skip] 2021-02-01 22:29:04 +11:00
Adriane Boyd 35a863cd27 Remove nlp.tokenizer from quickstart template
Remove `nlp.tokenizer` from quickstart template so that the default
language-specific tokenizer settings are filled instead.
2021-02-01 11:20:12 +01:00
Ines Montani f058cbd751 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2021-01-30 21:03:25 +11:00
Ines Montani 3435b894df Remove nightly reference from auto docs [ci skip] 2021-01-30 20:12:08 +11:00
Ines Montani d0c3775712 Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
Ines Montani b26a3daa9a
Merge pull request #6860 from explosion/feature/package-wheel 2021-01-30 14:17:01 +11:00
Ines Montani 2332c4280b Update and use unified --build option 2021-01-30 13:11:36 +11:00
Ines Montani e6accb3a9e Tidy up and auto-format 2021-01-30 12:52:33 +11:00
Ines Montani 30765674d0 Merge branch 'master' into develop 2021-01-30 12:20:28 +11:00
Ines Montani 2609ba4e89 Support building wheel in spacy package 2021-01-30 11:54:02 +11:00
Pamphile ROY 41ee75ac6d
Remove --no-cache-dir when downloading models
When `--no-cache-dir` is present, it prevents caching to properly function.

If the user still wants to do this, there is the possibility to pass options with `user_pip_args`.
But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models.
2021-01-29 15:37:44 +01:00
Ines Montani 78d6ff4dd4 Update quickstart recommendations 2021-01-28 11:14:49 +11:00
Ines Montani ec5f55aa5b
Update config generation defaults and transformers (#6832) 2021-01-27 23:56:33 +11:00
Ines Montani c0926c9088
WIP: Various small training changes (#6818)
* Allow output_path to be None during training

* Fix cat scoring (?)

* Improve error message for weighted None score

* Improve messages

So we can call this in other places etc.

* FIx output path check

* Use latest wasabi

* Revert "Improve error message for weighted None score"

This reverts commit 7059926763.

* Exclude None scores from final score by default

It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc.

* Update warnings and use logger.warning
2021-01-26 14:51:52 +11:00
Adriane Boyd 0f2de39efb
Fix types for exclude args in info CLI (#6808) 2021-01-25 20:00:22 +08:00
KeshavG-lb 0a86d833d7
Spacy Cli info method causing backward compatibility issues (#6793)
* Spacy Cli info method causing backward compatibility issues #6791

fix backward compatibility by setting default value to exclude in info
method.

* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.

Reference : https://nikos7am.com/posts/mutable-default-arguments/
2021-01-23 11:21:43 +01:00
Ines Montani b0b743597c Tidy up and auto-format 2021-01-15 11:57:36 +11:00
Ines Montani e8a97a2bd6
Merge pull request #6720 from adrianeboyd/feature/improved-init-training-config-validation 2021-01-15 11:45:24 +11:00
Adriane Boyd 5fb8b7037a Expand initialize/training config validation
Validate both `[initialize]` and `[training]` in `debug data` and
`nlp.initialize()` with separate config validation error blocks that
indicate which block of the config is being validated.
2021-01-12 17:17:00 +01:00
svlandeg 1abeca90a6 refer to _parser_internals.nonproj.DELIMITER 2021-01-07 18:58:13 +01:00
Sofie Van Landeghem 75d9019343
Fix types of Tok2Vec encoding architectures (#6442)
* fix TorchBiLSTMEncoder documentation

* ensure the types of the encoding Tok2vec layers are correct

* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem afc5714d32
multi-label textcat component (#6474)
* multi-label textcat component

* formatting

* fix comment

* cleanup

* fix from #6481

* random edit to push the tests

* add explicit error when textcat is called with multi-label gold data

* fix error nr

* small fix
2021-01-06 13:07:14 +11:00
Ines Montani 6f83abb971
Merge pull request #6647 from svlandeg/feature/init_config_overwrite 2021-01-05 14:59:04 +11:00
Ines Montani 81f018fb67
Merge pull request #6671 from explosion/chore/tidy-autoformat
Tidy up and auto-format
2021-01-05 14:45:31 +11:00
Ines Montani a9e845426f Use --force for consistency and add docs 2021-01-05 13:49:59 +11:00