mirror of https://github.com/explosion/spaCy.git
remove non-existing link
This commit is contained in:
parent
543073bf9d
commit
43cc6aea93
|
@ -807,15 +807,16 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
|
||||||
## pretrain {#pretrain new="2.1" tag="command,experimental"}
|
## pretrain {#pretrain new="2.1" tag="command,experimental"}
|
||||||
|
|
||||||
Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
|
Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
|
||||||
components on [raw text](/api/data-formats#pretrain), using an approximate
|
components on raw text, using an approximate language-modeling objective.
|
||||||
language-modeling objective. Specifically, we load pretrained vectors, and train
|
Specifically, we load pretrained vectors, and train a component like a CNN,
|
||||||
a component like a CNN, BiLSTM, etc to predict vectors which match the
|
BiLSTM, etc to predict vectors which match the pretrained ones. The weights are
|
||||||
pretrained ones. The weights are saved to a directory after each epoch. You can
|
saved to a directory after each epoch. You can then include a **path to one of
|
||||||
then include a **path to one of these pretrained weights files** in your
|
these pretrained weights files** in your
|
||||||
[training config](/usage/training#config) as the `init_tok2vec` setting when you
|
[training config](/usage/training#config) as the `init_tok2vec` setting when you
|
||||||
train your pipeline. This technique may be especially helpful if you have little
|
train your pipeline. This technique may be especially helpful if you have little
|
||||||
labelled data. See the usage docs on
|
labelled data. See the usage docs on
|
||||||
[pretraining](/usage/embeddings-transformers#pretraining) for more info.
|
[pretraining](/usage/embeddings-transformers#pretraining) for more info. To read
|
||||||
|
the raw text, a [`JsonlCorpus`](/api/top-level#JsonlCorpus) is typically used.
|
||||||
|
|
||||||
<Infobox title="Changed in v3.0" variant="warning">
|
<Infobox title="Changed in v3.0" variant="warning">
|
||||||
|
|
||||||
|
@ -835,7 +836,6 @@ auto-generated by setting `--pretraining` on
|
||||||
> $ python -m spacy pretrain config.cfg output_pretrain --paths.raw_text="data.jsonl"
|
> $ python -m spacy pretrain config.cfg output_pretrain --paths.raw_text="data.jsonl"
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
|
||||||
```cli
|
```cli
|
||||||
$ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides]
|
$ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides]
|
||||||
```
|
```
|
||||||
|
|
Loading…
Reference in New Issue