mirror of https://github.com/explosion/spaCy.git
Update docs [ci skip]
This commit is contained in:
parent
a127fa475e
commit
0406200a1e
|
@ -123,20 +123,11 @@ $ python -m spacy train config.cfg --paths.train ./corpus/train.spacy
|
||||||
|
|
||||||
### corpora {#config-corpora tag="section"}
|
### corpora {#config-corpora tag="section"}
|
||||||
|
|
||||||
This section defines a dictionary mapping of string keys to `Callable`
|
|
||||||
functions. Each callable takes an `nlp` object and yields
|
|
||||||
[`Example`](/api/example) objects. By default, the two keys `train` and `dev`
|
|
||||||
are specified and each refer to a [`Corpus`](/api/top-level#Corpus). When
|
|
||||||
pretraining, an additional pretrain section is added that defaults to a
|
|
||||||
[`JsonlReader`](/api/top-level#JsonlReader).
|
|
||||||
|
|
||||||
These subsections can be expanded with additional subsections, each referring to
|
|
||||||
a callback of type `Callable[[Language], Iterator[Example]]`:
|
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
> [corpora]
|
> [corpora]
|
||||||
|
>
|
||||||
> [corpora.train]
|
> [corpora.train]
|
||||||
> @readers = "spacy.Corpus.v1"
|
> @readers = "spacy.Corpus.v1"
|
||||||
> path = ${paths:train}
|
> path = ${paths:train}
|
||||||
|
@ -148,28 +139,44 @@ a callback of type `Callable[[Language], Iterator[Example]]`:
|
||||||
> [corpora.pretrain]
|
> [corpora.pretrain]
|
||||||
> @readers = "spacy.JsonlReader.v1"
|
> @readers = "spacy.JsonlReader.v1"
|
||||||
> path = ${paths.raw}
|
> path = ${paths.raw}
|
||||||
> min_length = 5
|
|
||||||
> max_length = 500
|
|
||||||
>
|
>
|
||||||
> [corpora.mydata]
|
> [corpora.my_custom_data]
|
||||||
> @readers = "my_reader.v1"
|
> @readers = "my_custom_reader.v1"
|
||||||
> shuffle = true
|
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
Alternatively, the `corpora` block could refer to one function with return type
|
This section defines a **dictionary** mapping of string keys to functions. Each
|
||||||
`Dict[str, Callable[[Language], Iterator[Example]]]`:
|
function takes an `nlp` object and yields [`Example`](/api/example) objects. By
|
||||||
|
default, the two keys `train` and `dev` are specified and each refer to a
|
||||||
|
[`Corpus`](/api/top-level#Corpus). When pretraining, an additional `pretrain`
|
||||||
|
section is added that defaults to a [`JsonlReader`](/api/top-level#JsonlReader).
|
||||||
|
You can also register custom functions that return a callable.
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `train` | Training data corpus, typically used in `[training]` block. ~~Callable[[Language], Iterator[Example]]~~ |
|
||||||
|
| `dev` | Development data corpus, typically used in `[training]` block. ~~Callable[[Language], Iterator[Example]]~~ |
|
||||||
|
| `pretrain` | Raw text for [pretraining](/usage/embeddings-transformers#pretraining), typically used in `[pretraining]` block (if available). ~~Callable[[Language], Iterator[Example]]~~ |
|
||||||
|
| ... | Any custom or alternative corpora. ~~Callable[[Language], Iterator[Example]]~~ |
|
||||||
|
|
||||||
|
Alternatively, the `[corpora]` block can refer to **one function** that returns
|
||||||
|
a dictionary keyed by the corpus names. This can be useful if you want to load a
|
||||||
|
single corpus once and then divide it up into `train` and `dev` partitions.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
> [corpora]
|
> [corpora]
|
||||||
> @readers = "my_dict_reader.v1"
|
> @readers = "my_custom_reader.v1"
|
||||||
> train_path = ${paths:train}
|
> train_path = ${paths:train}
|
||||||
> dev_path = ${paths:dev}
|
> dev_path = ${paths:dev}
|
||||||
> shuffle = true
|
> shuffle = true
|
||||||
>
|
>
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
|
| `corpora` | A dictionary keyed by string names, mapped to corpus functions that receive the current `nlp` object and return an iterator of [`Example`](/api/example) objects. ~~Dict[str, Callable[[Language], Iterator[Example]]]~~ |
|
||||||
|
|
||||||
### training {#config-training tag="section"}
|
### training {#config-training tag="section"}
|
||||||
|
|
||||||
This section defines settings and controls for the training and evaluation
|
This section defines settings and controls for the training and evaluation
|
||||||
|
|
Loading…
Reference in New Issue