mirror of https://github.com/explosion/spaCy.git
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
This commit is contained in:
commit
1780a6ea49
|
@ -8,6 +8,7 @@ menu:
|
||||||
- ['Config System', 'config']
|
- ['Config System', 'config']
|
||||||
- ['Custom Training', 'config-custom']
|
- ['Custom Training', 'config-custom']
|
||||||
- ['Custom Functions', 'custom-functions']
|
- ['Custom Functions', 'custom-functions']
|
||||||
|
- ['Initialization', 'initialization']
|
||||||
- ['Data Utilities', 'data']
|
- ['Data Utilities', 'data']
|
||||||
- ['Parallel Training', 'parallel-training']
|
- ['Parallel Training', 'parallel-training']
|
||||||
- ['Internal API', 'api']
|
- ['Internal API', 'api']
|
||||||
|
@ -824,12 +825,15 @@ def MyModel(output_width: int) -> Model[List[Doc], List[Floats2d]]:
|
||||||
return create_model(output_width)
|
return create_model(output_width)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Customizing the initialization {#initialization}
|
## Customizing the initialization {#initialization}
|
||||||
|
|
||||||
When you start training a new model from scratch,
|
When you start training a new model from scratch,
|
||||||
[`spacy train`](/api/cli#train) will call
|
[`spacy train`](/api/cli#train) will call
|
||||||
[`nlp.initialize`](/api/language#initialize) to initialize the pipeline for
|
[`nlp.initialize`](/api/language#initialize) to initialize the pipeline and load
|
||||||
training. This process typically includes the following:
|
the required data. All settings for this are defined in the
|
||||||
|
[`[initialize]`](/api/data-formats#config-initialize) block of the config, so
|
||||||
|
you can keep track of how the initial `nlp` object was created. The
|
||||||
|
initialization process typically includes the following:
|
||||||
|
|
||||||
> #### config.cfg (excerpt)
|
> #### config.cfg (excerpt)
|
||||||
>
|
>
|
||||||
|
@ -859,10 +863,22 @@ The initialization step allows the config to define **all settings** required
|
||||||
for the pipeline, while keeping a separation between settings and functions that
|
for the pipeline, while keeping a separation between settings and functions that
|
||||||
should only be used **before training** to set up the initial pipeline, and
|
should only be used **before training** to set up the initial pipeline, and
|
||||||
logic and configuration that needs to be available **at runtime**. Without that
|
logic and configuration that needs to be available **at runtime**. Without that
|
||||||
separation, TODO:
|
separation, it would be very difficult to use the came, reproducible config file
|
||||||
|
because the component settings required for training (load data from an external
|
||||||
|
file) wouldn't match the component settings required at runtime (load what's
|
||||||
|
included with the saved `nlp` object and don't depend on external file).
|
||||||
|
|
||||||
![Illustration of pipeline lifecycle](../images/lifecycle.svg)
|
![Illustration of pipeline lifecycle](../images/lifecycle.svg)
|
||||||
|
|
||||||
|
<Infobox title="How components save and load data" emoji="📖">
|
||||||
|
|
||||||
|
For details and examples of how pipeline components can **save and load data
|
||||||
|
assets** like model weights or lookup tables, and how the component
|
||||||
|
initialization is implemented under the hood, see the usage guide on
|
||||||
|
[serializing and initializing component data](/usage/processing-pipelines#component-data-initialization).
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
#### Initializing labels {#initialization-labels}
|
#### Initializing labels {#initialization-labels}
|
||||||
|
|
||||||
Built-in pipeline components like the
|
Built-in pipeline components like the
|
||||||
|
|
Loading…
Reference in New Issue