Commit Graph

1620 Commits

Author SHA1 Message Date
Ines Montani 04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip] 2021-07-06 22:20:24 +10:00
Sofie Van Landeghem b9f59118bf
Fix silent evaluation (#8581)
* fix silentness

* sneak in docs typo fix

* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Adriane Boyd 29906884c5
Raise an error for textcat with <2 labels (#8584)
* Raise an error for textcat with <2 labels

Raise an error if initializing a `textcat` component without at least
two labels.

* Add similar note to docs

* Update positive_label description in API docs
2021-07-06 12:35:22 +02:00
Ines Montani 5bb7fe4b41 Update with HF hub integration [ci skip] 2021-07-06 19:30:59 +10:00
Cass 7d13fc799b
Fix a command typo in models.md
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00
Ines Montani 8423864b50
Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597) 2021-07-05 13:49:20 +02:00
Ines Montani af9d984407
Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip] 2021-06-30 20:52:59 +10:00
Adriane Boyd 41292a1b84 Add note about updating with fill-config 2021-06-29 10:45:36 +02:00
Adriane Boyd 4d1ef8f695 Tidy up docs 2021-06-28 12:08:15 +02:00
Ines Montani 4544412442
Update wording [ci skip]
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-25 13:52:48 +10:00
Ines Montani 0d2e2b59bc Update intro [ci skip] 2021-06-24 22:53:20 +10:00
Matthew Honnibal f9946154d9
Add SpanCategorizer component (#6747)
* Draft spancat model

* Add spancat model

* Add test for extract_spans

* Add extract_spans layer

* Upd extract_spans

* Add spancat model

* Add test for spancat model

* Upd spancat model

* Update spancat component

* Upd spancat

* Update spancat model

* Add quick spancat test

* Import SpanCategorizer

* Fix SpanCategorizer component

* Import SpanGroup

* Fix span extraction

* Fix import

* Fix import

* Upd model

* Update spancat models

* Add scoring, update defaults

* Update and add docs

* Fix type

* Update spacy/ml/extract_spans.py

* Auto-format and fix import

* Fix comment

* Fix type

* Fix type

* Update website/docs/api/spancategorizer.md

* Fix comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Better defense

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix labels list

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/extract_spans.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Set annotations during update

* Set annotations in spancat

* fix imports in test

* Update spacy/pipeline/spancat.py

* replace MaxoutLogistic with LinearLogistic

* fix config

* various small fixes

* remove set_annotations parameter in update

* use our beloved tupley format with recent support for doc.spans

* bugfix to allow renaming the default span_key (scores weren't showing up)

* use different key in docs example

* change defaults to better-working parameters from project (WIP)

* register spacy.extract_spans.v1 for legacy purposes

* Upd dev version so can build wheel

* layers instead of architectures for smaller building blocks

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Include additional scores from overrides in combined score weights

* Parameterize spans key in scoring

Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so
that it's possible to evaluate multiple `spancat` components in the same
pipeline.

* Use the (intentionally very short) default spans key `sc` in the
  `SpanCategorizer`
* Adjust the default score weights to include the default key
* Adjust the scorer to use `spans_{spans_key}` as the prefix for the
  returned score
* Revert addition of `attr_name` argument to `score_spans` and adjust
  the key in the `getter` instead.

Note that for `spancat` components with a custom `span_key`, the score
weights currently need to be modified manually in
`[training.score_weights]` for them to be available during training. To
suppress the default score weights `spans_sc_p/r/f` during training, set
them to `null` in `[training.score_weights]`.

* Update website/docs/api/scorer.md

* Fix scorer for spans key containing underscore

* Increment version

* Add Spans to Evaluate CLI (#8439)

* Add Spans to Evaluate CLI

* Change to spans_key

* Add spans per_type output

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix spancat GPU issues (#8455)

* Fix GPU issues

* Require thinc >=8.0.6

* Switch to glorot_uniform_init

* Fix and test ngram suggester

* Include final ngram in doc for all sizes
* Fix ngrams for docs of the same length as ngram size
* Handle batches of docs that result in no ngrams
* Add tests

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Nirant <NirantK@users.noreply.github.com>
2021-06-24 12:35:27 +02:00
Ines Montani 68721af628 Formatting and preliminary intro [ci skip] 2021-06-24 20:32:23 +10:00
Adriane Boyd 92dc6b409e Notes on source with vectors 2021-06-24 10:34:07 +02:00
Adriane Boyd 35425d7e26 Add details for Catalan and Danish 2021-06-24 10:10:33 +02:00
Ines Montani 5daf450f51 Update upgrading notes [ci skip] 2021-06-24 18:06:28 +10:00
Ines Montani 528746129d Merge branch 'master' into docs/new-in-v3-1 2021-06-24 13:11:37 +10:00
Ines Montani a8e8d02ba7
Merge pull request #8465 from explosion/feature/spacy-package-readme 2021-06-24 13:11:08 +10:00
Ines Montani 3e058dee62 Update features [ci skip] 2021-06-24 12:36:04 +10:00
Ines Montani 40f13c3f0c Add docs [ci skip] 2021-06-24 11:57:15 +10:00
Ines Montani a1e4aca267 Fix sentence [ci skip] 2021-06-24 11:40:36 +10:00
Ines Montani ca0d904faa Update details [ci skip] 2021-06-23 13:05:56 +10:00
themrmax d96c422cfc
Fix broken link
change /api/registry to /api/top-level#registry
2021-06-22 15:34:06 -07:00
Ines Montani e9b68d4f4c Update details and add example [ci skip] 2021-06-22 17:51:03 +10:00
Nick Sorros 31504f5982
Switch model and data path in prodigy project.yml recipe (#8467) 2021-06-22 09:41:45 +02:00
Ines Montani bc93c34f54 Add "New in v3.1" guide 2021-06-22 15:23:18 +10:00
Adriane Boyd e39d1bd4ab
Various docs updates for v3.1 (#8406)
* Update for Catalan/Italian lemmatizer changes

* Add warning about relevance of section
2021-06-21 09:33:50 +02:00
Ines Montani 02d2fdb123 Add link anchor [ci skip] 2021-06-20 11:29:19 +10:00
Matthew Honnibal 6f5e308d17
Support negative examples in partial NER annotations (#8106)
* Support a cfg field in transition system

* Make NER 'has gold' check use right alignment for span

* Pass 'negative_samples_key' property into NER transition system

* Add field for negative samples to NER transition system

* Check neg_key in NER has_gold

* Support negative examples in NER oracle

* Test for negative examples in NER

* Fix name of config variable in NER

* Remove vestiges of old-style partial annotation

* Remove obsolete tests

* Add comment noting lack of support for negative samples in parser

* Additions to "neg examples" PR (#8201)

* add custom error and test for deprecated format

* add test for unlearning an entity

* add break also for Begin's cost

* add negative_samples_key property on Parser

* rename

* extend docs & fix some older docs issues

* add subclass constructors, clean up tests, fix docs

* add flaky test with ValueError if gold parse was not found

* remove ValueError if n_gold == 0

* fix docstring

* Hack in environment variables to try out training

* Remove hack

* Remove NER hack, and support 'negative O' samples

* Fix O oracle

* Fix transition parser

* Remove 'not O' from oracle

* Fix NER oracle

* check for spans in both gold.ents and gold.spans and raise if so, to prevent memory access violation

* use set instead of list in consistency check

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-17 17:33:00 +10:00
svlandeg bb9d2f1546 extend example to ensure the text is preserved 2021-06-16 23:56:35 +02:00
Sofie Van Landeghem e796aab4b3
Resizable textcat (#7862)
* implement textcat resizing for TextCatCNN

* resizing textcat in-place

* simplify code

* ensure predictions for old textcat labels remain the same after resizing (WIP)

* fix for softmax

* store softmax as attr

* fix ensemble weight copy and cleanup

* restructure slightly

* adjust documentation, update tests and quickstart templates to use latest versions

* extend unit test slightly

* revert unnecessary edits

* fix typo

* ensemble architecture won't be resizable for now

* use resizable layer (WIP)

* revert using resizable layer

* resizable container while avoid shape inference trouble

* cleanup

* ensure model continues training after resizing

* use fill_b parameter

* use fill_defaults

* resize_layer callback

* format

* bump thinc to 8.0.4

* bump spacy-legacy to 3.0.6
2021-06-16 11:45:00 +02:00
svlandeg 29d83dec0c adjust whitespace tokenizer to avoid sep in split() 2021-06-16 10:58:45 +02:00
Adriane Boyd 5646fcbe46 Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
Sofie Van Landeghem 0fd0d949c4
fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
Adriane Boyd 507422149f
Various docs updates for v3.0 (#8353)
* Update cats score names in Scorer API docs

* Refer to performance in meta

* Update package naming/versions, lemmatizer details

* Minor formatting fixes

* Provide more explanation for cats_score_desc

* Provide language-specific lemmatizer defaults in API docs

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-06-14 12:19:36 +02:00
Sofie Van Landeghem 3c58c0323f
fix docs (#8200) 2021-05-27 10:48:59 +02:00
Paul O'Leary McCann 0c553ecd4e Fix docs (fix #8189) 2021-05-24 19:47:30 +09:00
Sofie Van Landeghem 202943bc8c
KB & NEL to/from bytes (#8113)
* unit test for pickling KB

* add pickling test for NEL

* KB to_bytes and from_bytes

* NEL to_bytes and from_bytes

* xfail pickle tests for now

* fix docs

* cleanup
2021-05-20 18:11:30 +10:00
Adriane Boyd 6baab565eb
Minor updates to quickstart settings/instructions (#7965)
* Minor updates to quickstart settings/instructions

* set default value of textcat exclusive to `false` until the default
checkbox behavior is updated
* add the `morphologizer` to the list of components
* add a note that v3.0.6+ is required

* Switch to warning above quickstart

* Undo changes to textcat default in quickstart

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-05-17 16:55:22 +02:00
Adriane Boyd 71c2a3ab47
Fix new version for match_alignments (#8021) 2021-05-07 09:55:20 +02:00
Sofie Van Landeghem 02a6a5fea0
Fix 'debug model' for transformers + generalize (#7973)
* add overrides to docs

* fix debug model with transformer

* assume training data is set in config
2021-05-06 18:43:32 +10:00
Paul O'Leary McCann 66bfabd839
Fix pretraining objectives fragment (#8005)
* Fix pretraining objectives fragment

The fragment here is reused from a heading higher up, so you couldn't
link to this section.

* Fix section link to new fragment
2021-05-06 08:27:36 +02:00
Adriane Boyd 2320791f6d
Fix Transformer.initialize example (#7963) 2021-04-30 12:21:31 +02:00
Adriane Boyd 95c0833656
Add training option to set annotations on update (#7767)
* Add training option to set annotations on update

Add a `[training]` option called `set_annotations_on_update` to specify
a list of components for which the predicted annotations should be set
on `example.predicted` immediately after that component has been
updated. The predicted annotations can be accessed by later components
in the pipeline during the processing of the batch in the same `update`
call.

* Rename to annotates / annotating_components

* Add test for `annotating_components` when training from config

* Add documentation
2021-04-26 16:53:53 +02:00
Adriane Boyd bdb485cc80
Add callback to copy vocab/tokenizer from model (#7750)
* Add callback to copy vocab/tokenizer from model

Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer
settings and/or vocab (including vectors) from a base model.

* Move spacy.copy_from_base_model.v1 to spacy.training.callbacks

* Add documentation

* Modify to specify model as tokenizer and vocab params
2021-04-22 12:36:50 +02:00
Adriane Boyd f68fc29130
Update sent_starts in Example.from_dict (#7847)
* Update sent_starts in Example.from_dict

Update `sent_starts` for `Example.from_dict` so that `Optional[bool]`
values have the same meaning as for `Token.is_sent_start`.

Use `Optional[bool]` as the type for sent start values in the docs.

* Use helper function for conversion to ternary ints
2021-04-22 11:32:45 +02:00
Adriane Boyd d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Sofie Van Landeghem 6f565cf39d
fix typo in entity_linker docs 2021-04-22 09:59:24 +02:00
Sofie Van Landeghem 2e746dbf32
update EL training data format in docs (#7839)
* update EL training data format

* fix typo

* all -1 because reasons
2021-04-22 08:50:09 +02:00
Shantam Raj 6017fcf693
Default code for Setting Entity annotations on the website errors (#7738)
* the default example for "Setting entity annotations" errors on Binder

* updating contributer info

* using a new variable to store original entities
2021-04-21 09:16:32 +02:00
langdonholmes df541c6b5e
Update processing-pipelines.md to mention method for doc metadata (#7480)
* Update processing-pipelines.md

Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True)

Link to a new example on the attributes page detailing the following:

> ```
> data = [
>   ("Some text to process", {"meta": "foo"}),
>   ("And more text...", {"meta": "bar"})
> ]
> 
> for doc, context in nlp.pipe(data, as_tuples=True):
>     # Let's assume you have a "meta" extension registered on the Doc
>     doc._.meta = context["meta"]
> ```

from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as

* Updating the attributes section

Update the attributes section with example of how extensions can be used to store metadata.

* Update processing-pipelines.md

* Update processing-pipelines.md

Made as_tuples example executable and relocated to the end of the "Processing Text" section.

* Update processing-pipelines.md

* Update processing-pipelines.md

Removed extra line

* Reformat and rephrase

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-04-19 11:58:12 +02:00
Adriane Boyd 0e7f94b247
Update Tokenizer.explain with special matches (#7749)
* Update Tokenizer.explain with special matches

Update `Tokenizer.explain` and the pseudo-code in the docs to include
the processing of special cases that contain affixes or whitespace.

* Handle optional settings in explain

* Add test for special matches in explain

Add test for `Tokenizer.explain` for special cases containing affixes.
2021-04-19 19:08:20 +10:00
Sofie Van Landeghem c786e98e56
assemble CLI command (#7783)
* assemble CLI command

* ensure assemble runs even without training section

* cleanup
2021-04-19 18:39:11 +10:00
Bram Vanroy ed561cf428
Terminology: deprecated vs obsolete (#7621)
* Terminology: deprecated vs obsolete

Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works.

In light of this, perhaps all other error codes should be checked as well.

* clarify that the link command is removed and not just deprecated

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-12 14:37:00 +02:00
Adriane Boyd 673e2bc4c0
Add usage docs for streamed train corpora (#7693) 2021-04-09 16:15:38 +02:00
Sofie Van Landeghem 204c2f116b
Extend score_spans for overlapping & non-labeled spans (#7209)
* extend span scorer with consider_label and allow_overlap

* unit test for spans y2x overlap

* add score_spans unit test

* docs for new fields in scorer.score_spans

* rename to include_label

* spell out if-else for clarity

* rename to 'labeled'

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-04-08 12:19:17 +02:00
broaddeep ee159b8543
Support match alignments (#7321)
* Support match alignments

* change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case

* remove added errors, utilize bint type, cleanup whitespace

* fix no new line in end of file

* Minor formatting

* Skip alignments processing if as_spans is set

* Add with_alignments to Matcher API docs

* Update website/docs/api/matcher.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-04-08 18:10:14 +10:00
Ines Montani de4f4c9b8a Add more link anchors [ci skip] 2021-04-06 14:15:21 +10:00
Ines Montani 5bbdd7dc4c Update pipeline design docs [ci skip] 2021-04-06 14:13:22 +10:00
Ines Montani 1d1cfadbca Fix formatting [ci skip] 2021-04-06 14:13:13 +10:00
Ayush Chaurasia 3c2ce41dd8
W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429)
* Add optional artifacts logging

* Update docs

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Bump WandbLogger Version

* Add documentation of v1 to legacy docs

* bump spacy-legacy to 3.0.2 (to be released)

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-01 19:36:23 +02:00
Sofie Van Landeghem 59c2069eb1
Legacy docs (#7601)
* document legacy Tok2Vec architectures

* add TextCatEnsemble.v1 legacy documentation

* Separate legacy section in side bar
2021-03-30 12:43:14 +02:00
Santiago Castro af07fc3bc1
Add support for CUDA 11.2 (#7583)
* Add support for CUDA 11.2

* Update the docs

* Format

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán 5b4dde38a3
fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606) 2021-03-30 09:45:49 +02:00
Ines Montani be55f43163
Merge pull request #7473 from adrianeboyd/docs/v3-pipeline-deps-order 2021-03-22 12:43:07 +01:00
Ines Montani 3ee2fcfba0
Merge pull request #7483 from adrianeboyd/docs/various-v3-4 [ci skip] 2021-03-22 12:37:06 +01:00
Ines Montani 88e5a0dc16
Merge pull request #7504 from polm/fix/lexeme-docs [ci skip]
Fix mismatched backtick in Lexeme docs
2021-03-22 12:36:44 +01:00
Adriane Boyd 0d2b723e8d Update entity setting section 2021-03-20 11:38:55 +01:00
Paul O'Leary McCann e39c0dcf33 Fix mismatched backtick in Lexeme docs 2021-03-20 18:40:00 +09:00
Adriane Boyd c771ec22f0 Update matcher errors and docs
* Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for
`Matcher` and `PhraseMatcher`
* Document `Matcher.__call__(allow_missing=)`
2021-03-19 10:11:18 +01:00
Adriane Boyd 6a9a467766
Update website/docs/usage/processing-pipelines.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-03-19 08:12:49 +01:00
Adriane Boyd 6354b642c5
Fix typo 2021-03-18 19:01:10 +01:00
Adriane Boyd 40e5d3a980 Update saving/loading example 2021-03-18 16:56:10 +01:00
Adriane Boyd 0fb1881f36 Reformat processing pipelines 2021-03-18 13:31:42 +01:00
Adriane Boyd acc58719da Update custom similarity hooks example 2021-03-18 13:31:42 +01:00
Adriane Boyd c9e1a9ac17 Add multiprocessing section 2021-03-18 13:31:42 +01:00
Adriane Boyd 9a254d3995 Include all en_core_web_sm components in examples 2021-03-18 13:31:42 +01:00
Adriane Boyd 83c1b919a7 Fix positional/option in CLI types 2021-03-18 13:31:42 +01:00
Adriane Boyd 9fd41d6742 Remove Language.pipe cleanup arg 2021-03-18 13:31:42 +01:00
Adriane Boyd 5da323fd86
Minor edits 2021-03-17 12:59:05 +01:00
Adriane Boyd a5ffe8dfed Add details about pretrained pipeline design 2021-03-17 11:31:26 +01:00
bsweileh 61472e7cb3
Update _training.md - Fix broken link on backpropagation (#7431)
* Update _training.md

Fix broken link on backpropagation

* Add agreement

add spacy contributor agreement
2021-03-15 09:21:35 +01:00
Ines Montani c67d5a6eb0
Merge pull request #7394 from adrianeboyd/docs/ner-example-data-readme 2021-03-13 04:26:18 +01:00
Ines Montani 068b97a617
Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only 2021-03-13 04:25:50 +01:00
Adriane Boyd 3168103605 Fix type of spacy train --output in docs 2021-03-12 10:04:57 +01:00
Adriane Boyd 03e9e7b567 Add --code option to init fill-config 2021-03-12 10:03:57 +01:00
Adriane Boyd 124304b146 Add vocab kwarg back to spacy.load
* Additional minor formatting and docs cleanup
2021-03-11 10:58:59 +01:00
Adriane Boyd 84470d9b9e Incorporate BILUO note from #7407 2021-03-11 10:11:21 +01:00
Adriane Boyd 4294bcf4ab Align keyword-only in docs for init/util 2021-03-11 09:52:40 +01:00
Adriane Boyd 28726c25a1 Update docs for convert CLI and NER examples 2021-03-10 11:42:02 +01:00
Adriane Boyd d746ea6278
Add warning about GPU selection in Jupyter notebooks (#7075)
* Initial warning

* Update check

* Redo edit

* Move jupyter warning to helper method

* Add link with details to warnings
2021-03-09 15:35:21 +01:00
Sofie Van Landeghem 932887b950
textcat scoring fix and multi_label docs (#6974)
* add multi-label textcat to menu

* add infobox on textcat API

* add info to v3 migration guide

* small edits

* further fixes in doc strings

* add infobox to textcat architectures

* add textcat_multilabel to overview of built-in components

* spelling

* fix unrelated warn msg

* Add textcat_multilabel to quickstart [ci skip]

* remove separate documentation page for multilabel_textcategorizer

* small edits

* positive label clarification

* avoid duplicating information in self.cfg and fix textcat.score

* fix multilabel textcat too

* revert threshold to storage in cfg

* revert threshold stuff for multi-textcat

Co-authored-by: Ines Montani <ines@ines.io>
2021-03-09 23:04:22 +11:00
Sofie Van Landeghem cd70c3cb79
Fixing pretrain (#7342)
* initialize NLP with train corpus

* add more pretraining tests

* more tests

* function to fetch tok2vec layer for pretraining

* clarify parameter name

* test different objectives

* formatting

* fix check for static vectors when using vectors objective

* clarify docs

* logger statement

* fix init_tok2vec and proc.initialize order

* test training after pretraining

* add init_config tests for pretraining

* pop pretraining block to avoid config validation errors

* custom errors
2021-03-09 14:01:13 +11:00
Ines Montani dfb23a419e Merge branch 'spacy.io' [ci skip] 2021-03-06 17:38:54 +11:00
graue70 7d085d5b1c
Fix typo in docs 2021-03-05 18:30:09 +01:00
svlandeg 682a6232e3 fix typo 2021-03-02 17:59:13 +01:00
svlandeg d900c55061 consistently use registry as callable 2021-03-02 17:56:28 +01:00
graue70 0fddc0447c
Fix copy & paste error in API docs 2021-03-02 14:00:14 +01:00
Ines Montani 8f7c7b2658
Merge pull request #7211 from svlandeg/docs/el_update [ci skip]
kb.get_candidates renamed to get_alias_candidates
2021-02-27 11:51:22 +11:00
Ines Montani 408b94887a
Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip]
Extend docs related to Vocab.get_noun_chunks
2021-02-27 11:51:08 +11:00