mirror of https://github.com/explosion/spaCy.git
Update docs [ci skip]
This commit is contained in:
parent
e863b3dc14
commit
554c9a2497
|
@ -8,7 +8,11 @@ train = ""
|
|||
dev = ""
|
||||
|
||||
[system]
|
||||
gpu_allocator = {{ "pytorch" if use_transformer else "" }}
|
||||
{% if use_transformer -%}
|
||||
gpu_allocator = "pytorch"
|
||||
{% else -%}
|
||||
gpu_allocator = null
|
||||
{% endif %}
|
||||
|
||||
[nlp]
|
||||
lang = "{{ lang }}"
|
||||
|
|
|
@ -60,7 +60,6 @@ your config and check that it's valid, you can run the
|
|||
> [nlp]
|
||||
> lang = "en"
|
||||
> pipeline = ["tagger", "parser", "ner"]
|
||||
> load_vocab_data = true
|
||||
> before_creation = null
|
||||
> after_creation = null
|
||||
> after_pipeline_creation = null
|
||||
|
@ -77,7 +76,6 @@ Defines the `nlp` object, its tokenizer and
|
|||
| `lang` | Pipeline language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). Defaults to `null`. ~~str~~ |
|
||||
| `pipeline` | Names of pipeline components in order. Should correspond to sections in the `[components]` block, e.g. `[components.ner]`. See docs on [defining components](/usage/training#config-components). Defaults to `[]`. ~~List[str]~~ |
|
||||
| `disabled` | Names of pipeline components that are loaded but disabled by default and not run as part of the pipeline. Should correspond to components listed in `pipeline`. After a pipeline is loaded, disabled components can be enabled using [`Language.enable_pipe`](/api/language#enable_pipe). ~~List[str]~~ |
|
||||
| `load_vocab_data` | Whether to load additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) if available. Defaults to `true`. ~~bool~~ |
|
||||
| `before_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `Language` subclass before it's initialized. Defaults to `null`. ~~Optional[Callable[[Type[Language]], Type[Language]]]~~ |
|
||||
| `after_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object right after it's initialized. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
|
||||
| `after_pipeline_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object after the pipeline components have been added. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
|
||||
|
@ -189,9 +187,10 @@ process that are used when you run [`spacy train`](/api/cli#train).
|
|||
| `dev_corpus` | Dot notation of the config location defining the dev corpus. Defaults to `corpora.dev`. ~~str~~ |
|
||||
| `dropout` | The dropout rate. Defaults to `0.1`. ~~float~~ |
|
||||
| `eval_frequency` | How often to evaluate during training (steps). Defaults to `200`. ~~int~~ |
|
||||
| `gpu_allocator` | Library for cupy to route GPU memory allocation to. Can be "pytorch" or "tensorflow". Defaults to variable `${system.gpu_allocator}`. ~~str~~ |
|
||||
| `frozen_components` | Pipeline component names that are "frozen" and shouldn't be updated during training. See [here](/usage/training#config-components) for details. Defaults to `[]`. ~~List[str]~~ |
|
||||
| `gpu_allocator` | Library for cupy to route GPU memory allocation to. Can be `"pytorch"` or `"tensorflow"`. Defaults to variable `${system.gpu_allocator}`. ~~str~~ |
|
||||
| `init_tok2vec` | Optional path to pretrained tok2vec weights created with [`spacy pretrain`](/api/cli#pretrain). Defaults to variable `${paths.init_tok2vec}`. ~~Optional[str]~~ |
|
||||
| `lookups` | Additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). Defaults to `null`. ~~Optional[Lookups]~~ |
|
||||
| `max_epochs` | Maximum number of epochs to train for. Defaults to `0`. ~~int~~ |
|
||||
| `max_steps` | Maximum number of update steps to train for. Defaults to `20000`. ~~int~~ |
|
||||
| `optimizer` | The optimizer. The learning rate schedule and other settings can be configured as part of the optimizer. Defaults to [`Adam`](https://thinc.ai/docs/api-optimizers#adam). ~~Optimizer~~ |
|
||||
|
@ -476,7 +475,7 @@ lexical data.
|
|||
Here's an example of the 20 most frequent lexemes in the English training data:
|
||||
|
||||
```json
|
||||
%%GITHUB_SPACY / extra / example_data / vocab - data.jsonl
|
||||
%%GITHUB_SPACY/extra/example_data/vocab-data.jsonl
|
||||
```
|
||||
|
||||
## Pipeline meta {#meta}
|
||||
|
|
|
@ -458,6 +458,16 @@ remain in the config file stored on your local system.
|
|||
| `project_name` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ |
|
||||
| `remove_config_values` | A list of values to include from the config before it is uploaded to W&B (default: empty). ~~List[str]~~ |
|
||||
|
||||
<Project id="integrations/wandb">
|
||||
|
||||
Get started with tracking your spaCy training runs in Weights & Biases using our
|
||||
project template. It trains on the IMDB Movie Review Dataset and includes a
|
||||
simple config with the built-in `WandbLogger`, as well as a custom example of
|
||||
creating variants of the config for a simple hyperparameter grid search and
|
||||
logging the results.
|
||||
|
||||
</Project>
|
||||
|
||||
## Readers {#readers source="spacy/training/corpus.py" new="3"}
|
||||
|
||||
Corpus readers are registered functions that load data and return a function
|
||||
|
|
|
@ -655,6 +655,16 @@ and pass in optional config overrides, like the path to the raw text file:
|
|||
$ python -m spacy pretrain config_pretrain.cfg ./output --paths.raw text.jsonl
|
||||
```
|
||||
|
||||
The following defaults are used for the `[pretraining]` block and merged into
|
||||
your existing config when you run [`init config`](/api/cli#init-config) or
|
||||
[`init fill-config`](/api/cli#init-fill-config) with `--pretraining`. If needed,
|
||||
you can [configure](#pretraining-configure) the settings and hyperparameters or
|
||||
change the [objective](#pretraining-details).
|
||||
|
||||
```ini
|
||||
%%GITHUB_SPACY/spacy/default_config_pretraining.cfg
|
||||
```
|
||||
|
||||
### How pretraining works {#pretraining-details}
|
||||
|
||||
The impact of [`spacy pretrain`](/api/cli#pretrain) varies, but it will usually
|
||||
|
|
|
@ -976,14 +976,12 @@ your results.
|
|||
|
||||
![Screenshot: Parameter importance using config values](../images/wandb2.jpg 'Parameter importance using config values')
|
||||
|
||||
<!-- TODO:
|
||||
|
||||
<Project id="integrations/wandb">
|
||||
|
||||
Get started with tracking your spaCy training runs in Weights & Biases using our
|
||||
project template. It includes a simple config using the `WandbLogger`, as well
|
||||
as a custom logger implementation you can adjust for your specific use case.
|
||||
project template. It trains on the IMDB Movie Review Dataset and includes a
|
||||
simple config with the built-in `WandbLogger`, as well as a custom example of
|
||||
creating variants of the config for a simple hyperparameter grid search and
|
||||
logging the results.
|
||||
|
||||
</Project>
|
||||
|
||||
-->
|
||||
|
|
Loading…
Reference in New Issue