Commit Graph

65 Commits

Author SHA1 Message Date
Sofie Van Landeghem e796aab4b3
Resizable textcat (#7862)
* implement textcat resizing for TextCatCNN

* resizing textcat in-place

* simplify code

* ensure predictions for old textcat labels remain the same after resizing (WIP)

* fix for softmax

* store softmax as attr

* fix ensemble weight copy and cleanup

* restructure slightly

* adjust documentation, update tests and quickstart templates to use latest versions

* extend unit test slightly

* revert unnecessary edits

* fix typo

* ensemble architecture won't be resizable for now

* use resizable layer (WIP)

* revert using resizable layer

* resizable container while avoid shape inference trouble

* cleanup

* ensure model continues training after resizing

* use fill_b parameter

* use fill_defaults

* resize_layer callback

* format

* bump thinc to 8.0.4

* bump spacy-legacy to 3.0.6
2021-06-16 11:45:00 +02:00
Adriane Boyd d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Sofie Van Landeghem 59c2069eb1
Legacy docs (#7601)
* document legacy Tok2Vec architectures

* add TextCatEnsemble.v1 legacy documentation

* Separate legacy section in side bar
2021-03-30 12:43:14 +02:00
Sofie Van Landeghem 932887b950
textcat scoring fix and multi_label docs (#6974)
* add multi-label textcat to menu

* add infobox on textcat API

* add info to v3 migration guide

* small edits

* further fixes in doc strings

* add infobox to textcat architectures

* add textcat_multilabel to overview of built-in components

* spelling

* fix unrelated warn msg

* Add textcat_multilabel to quickstart [ci skip]

* remove separate documentation page for multilabel_textcategorizer

* small edits

* positive label clarification

* avoid duplicating information in self.cfg and fix textcat.score

* fix multilabel textcat too

* revert threshold to storage in cfg

* revert threshold stuff for multi-textcat

Co-authored-by: Ines Montani <ines@ines.io>
2021-03-09 23:04:22 +11:00
Sofie Van Landeghem cd70c3cb79
Fixing pretrain (#7342)
* initialize NLP with train corpus

* add more pretraining tests

* more tests

* function to fetch tok2vec layer for pretraining

* clarify parameter name

* test different objectives

* formatting

* fix check for static vectors when using vectors objective

* clarify docs

* logger statement

* fix init_tok2vec and proc.initialize order

* test training after pretraining

* add init_config tests for pretraining

* pop pretraining block to avoid config validation errors

* custom errors
2021-03-09 14:01:13 +11:00
svlandeg 682a6232e3 fix typo 2021-03-02 17:59:13 +01:00
Sofie Van Landeghem 75d9019343
Fix types of Tok2Vec encoding architectures (#6442)
* fix TorchBiLSTMEncoder documentation

* ensure the types of the encoding Tok2vec layers are correct

* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem 82ae95267a
Docs for pretrain architectures (#6605)
* document pretraining architectures

* formatting

* bit more info

* small fixes
2021-01-06 16:12:30 +11:00
Sofie Van Landeghem 282a3b49ea
Fix parser resizing when there is no upper layer (#6460)
* allow resizing of the parser model even when upper=False

* update from spacy.TransitionBasedParser.v1 to v2

* bugfix
2020-12-18 18:56:57 +08:00
svlandeg 789fb3d124 add docs for upstream argument of TransformerListener 2020-11-09 21:42:58 +01:00
Sofie Van Landeghem 8ef056cf98
fix embed_size in Entity Linker architecture (#6343) 2020-11-04 22:20:13 +01:00
Sofie Van Landeghem 75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
svlandeg 40276fd3be update NEL docs after latest refactor 2020-10-12 11:41:27 +02:00
Ines Montani 1a554bdcb1 Update docs and docstring [ci skip] 2020-10-05 21:55:27 +02:00
Matthew Honnibal 919790cb47 Upd MultiHashEmbed docs 2020-10-05 20:28:21 +02:00
Sofie Van Landeghem a22215f427
Add FeatureExtractor from Thinc (#6170)
* move featureextractor from Thinc

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
Ines Montani d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani b92c8aae78 Merge branch 'develop' into pr/6135 2020-09-24 13:44:56 +02:00
walterhenry 3dd5f409ec Proofreading
Proofread some API docs
2020-09-24 13:15:28 +02:00
svlandeg dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
svlandeg 6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani 1955aaaa20
Merge pull request #6045 from svlandeg/feature/more-layers-docs [ci skip] 2020-09-09 21:46:40 +02:00
Sofie Van Landeghem cb66ea7400
Remove simple_ner code (#6041)
* remove simple_ner code

* remove unused _biluo and _iob files
2020-09-09 16:11:27 +02:00
svlandeg bd8f9b188b small fixes 2020-09-08 17:24:36 +02:00
Ines Montani 23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani 5afe6447cd registry.assets -> registry.misc 2020-09-03 17:31:14 +02:00
svlandeg bbaea530f6 sublayers paragraph 2020-09-02 17:36:22 +02:00
svlandeg aa9e0c9c39 small fix 2020-08-27 19:56:52 +02:00
svlandeg ec069627fe rename to TransformerListener 2020-08-26 13:31:01 +02:00
svlandeg feb86d5206 clarify default 2020-08-26 11:21:30 +02:00
Ines Montani c7c9b0451f Update docs [ci skip] 2020-08-22 13:52:52 +02:00
Ines Montani 74cb6d39d0 Update docs [ci skip] 2020-08-21 16:11:38 +02:00
Matthew Honnibal f5bcc10268 Update architectures 2020-08-21 15:34:54 +02:00
Matthew Honnibal 7ed8f4504b Update API docs for architectures 2020-08-21 15:22:19 +02:00
Ines Montani 04e4d59235 Update docs [ci skip] 2020-08-20 16:17:25 +02:00
Ines Montani 6ad59d59fe Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip] 2020-08-20 11:20:58 +02:00
svlandeg 648499157a rename "custom models" to "custom functions" 2020-08-19 16:53:51 +02:00
svlandeg 2dfd919585 add kb_loader and get_candidates back to EL API 2020-08-19 14:52:49 +02:00
svlandeg 0d55b6ebb4 formatting 2020-08-18 18:55:56 +02:00
svlandeg abba639565 Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs 2020-08-18 18:55:12 +02:00
Ines Montani 82f0e20318 Update docs and consistency [ci skip] 2020-08-18 14:39:40 +02:00
svlandeg f7b76d2d83 Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs 2020-08-18 11:57:52 +02:00
Ines Montani 728fec0194 Update docs [ci skip] 2020-08-18 00:49:19 +02:00
svlandeg da80c18660 merge develop into branch 2020-08-17 16:57:18 +02:00
Ines Montani 3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
svlandeg 319692aa53 fix typos 2020-08-17 14:05:48 +02:00
Ines Montani b7ec06e331 Update docs [ci skip] 2020-08-11 20:57:23 +02:00
Ines Montani 12052bd8f6 Update docs [ci skip] 2020-08-10 01:20:10 +02:00
Ines Montani 0832cdd443 Fix formatting [ci skip] 2020-08-10 00:46:32 +02:00