mirror of https://github.com/explosion/spaCy.git
document Pipe API details, crossreferences etc
This commit is contained in:
parent
9a7c6cc61a
commit
a8aa9a8068
|
@ -205,9 +205,16 @@ examples can either be the full training data or a representative sample. They
|
||||||
are used to **initialize the models** of trainable pipeline components and are
|
are used to **initialize the models** of trainable pipeline components and are
|
||||||
passed each component's [`begin_training`](/api/pipe#begin_training) method, if
|
passed each component's [`begin_training`](/api/pipe#begin_training) method, if
|
||||||
available. Initialization includes validating the network,
|
available. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](/usage/layers-architectures#shape-inference) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data.
|
||||||
|
|
||||||
|
If no `get_examples` function is provided when calling `nlp.begin_training`, the
|
||||||
|
pipeline components will be initialized with generic data. In this case, it is
|
||||||
|
crucial that the output dimension of each component has already been defined
|
||||||
|
either in the [config](/usage/training#config), or by calling
|
||||||
|
[`pipe.add_label`](/api/pipe#add_label) for each possible output label (e.g. for
|
||||||
|
the tagger or textcat).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0">
|
<Infobox variant="warning" title="Changed in v3.0">
|
||||||
|
|
||||||
The `Language.update` method now takes a **function** that is called with no
|
The `Language.update` method now takes a **function** that is called with no
|
||||||
|
|
|
@ -286,9 +286,6 @@ context, the original parameters are restored.
|
||||||
|
|
||||||
## Pipe.add_label {#add_label tag="method"}
|
## Pipe.add_label {#add_label tag="method"}
|
||||||
|
|
||||||
Add a new label to the pipe. It's possible to extend trained models with new
|
|
||||||
labels, but care should be taken to avoid the "catastrophic forgetting" problem.
|
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
|
@ -296,10 +293,85 @@ labels, but care should be taken to avoid the "catastrophic forgetting" problem.
|
||||||
> pipe.add_label("MY_LABEL")
|
> pipe.add_label("MY_LABEL")
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
<Infobox variant="danger">
|
||||||
|
|
||||||
|
This method needs to be overwritten with your own custom `add_label` method.
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
|
Add a new label to the pipe, to be predicted by the model. The actual
|
||||||
|
implementation depends on the specific component, but in general `add_label`
|
||||||
|
shouldn't be called if the output dimension is already set, or if the model has
|
||||||
|
already been fully [initialized](#begin_training). If these conditions are
|
||||||
|
violated, the function will raise an Error. The exception to this rule is when
|
||||||
|
the component is [resizable](#is_resizable), in which case
|
||||||
|
[`set_output`](#set_output) should be called to ensure that the model is
|
||||||
|
properly resized.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------- | ----------------------------------------------------------- |
|
| ----------- | ------------------------------------------------------- |
|
||||||
| `label` | The label to add. ~~str~~ |
|
| `label` | The label to add. ~~str~~ |
|
||||||
| **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
|
| **RETURNS** | 0 if the label is already present, otherwise 1. ~~int~~ |
|
||||||
|
|
||||||
|
Note that in general, you don't have to call `pipe.add_label` if you provide a
|
||||||
|
representative data sample to the [`begin_training`](#begin_training) method. In
|
||||||
|
this case, all labels found in the sample will be automatically added to the
|
||||||
|
model, and the output dimension will be
|
||||||
|
[inferred](/usage/layers-architectures#shape-inference) automatically.
|
||||||
|
|
||||||
|
## Pipe.is_resizable {#is_resizable tag="method"}
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> can_resize = pipe.is_resizable()
|
||||||
|
> ```
|
||||||
|
|
||||||
|
Check whether or not the output dimension of the component's model can be
|
||||||
|
resized. If this method returns `True`, [`set_output`](#set_output) can be
|
||||||
|
called to change the model's output dimension.
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ----------- | ---------------------------------------------------------------------------------------------- |
|
||||||
|
| **RETURNS** | Whether or not the output dimension of the model can be changed after initialization. ~~bool~~ |
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> def custom_resize(model, new_nO):
|
||||||
|
> # adjust model
|
||||||
|
> return model
|
||||||
|
> custom_model.attrs["resize_output"] = custom_resize
|
||||||
|
> ```
|
||||||
|
|
||||||
|
For built-in components that are not resizable, you have to create and train a
|
||||||
|
new model from scratch with the appropriate architecture and output dimension.
|
||||||
|
|
||||||
|
For custom components, you can implement a `resize_output` function and add it
|
||||||
|
as an attribute to the component's model.
|
||||||
|
|
||||||
|
## Pipe.set_output {#set_output tag="method"}
|
||||||
|
|
||||||
|
Change the output dimension of the component's model. If the component is not
|
||||||
|
[resizable](#is_resizable), this method will throw a `NotImplementedError`.
|
||||||
|
|
||||||
|
If a component is resizable, the model's attribute `resize_output` will be
|
||||||
|
called. This is a function that takes the original model and the new output
|
||||||
|
dimension `nO`, and changes the model in place.
|
||||||
|
|
||||||
|
When resizing an already trained model, care should be taken to avoid the
|
||||||
|
"catastrophic forgetting" problem.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> if pipe.is_resizable():
|
||||||
|
> pipe.set_output(512)
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ---- | --------------------------------- |
|
||||||
|
| `nO` | The new output dimension. ~~int~~ |
|
||||||
|
|
||||||
## Pipe.to_disk {#to_disk tag="method"}
|
## Pipe.to_disk {#to_disk tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -382,9 +382,11 @@ contrast to how the PyTorch layers are defined, where `in_features` precedes
|
||||||
### Shape inference in thinc {#shape-inference}
|
### Shape inference in thinc {#shape-inference}
|
||||||
|
|
||||||
It is not strictly necessary to define all the input and output dimensions for
|
It is not strictly necessary to define all the input and output dimensions for
|
||||||
each layer, as Thinc can perform shape inference between sequential layers by
|
each layer, as Thinc can perform
|
||||||
matching up the output dimensionality of one layer to the input dimensionality
|
[shape inference](https://thinc.ai/docs/usage-models#validation) between
|
||||||
of the next. This means that we can simplify the `layers` definition:
|
sequential layers by matching up the output dimensionality of one layer to the
|
||||||
|
input dimensionality of the next. This means that we can simplify the `layers`
|
||||||
|
definition:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
with Model.define_operators({">>": chain}):
|
with Model.define_operators({">>": chain}):
|
||||||
|
@ -399,8 +401,8 @@ with Model.define_operators({">>": chain}):
|
||||||
|
|
||||||
Thinc can go one step further and deduce the correct input dimension of the
|
Thinc can go one step further and deduce the correct input dimension of the
|
||||||
first layer, and output dimension of the last. To enable this functionality, you
|
first layer, and output dimension of the last. To enable this functionality, you
|
||||||
can call [`model.initialize`](https://thinc.ai/docs/api-model#initialize) with
|
have to call [`model.initialize`](https://thinc.ai/docs/api-model#initialize)
|
||||||
an input sample `X` and an output sample `Y` with the correct dimensions.
|
with an input sample `X` and an output sample `Y` with the correct dimensions.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
with Model.define_operators({">>": chain}):
|
with Model.define_operators({">>": chain}):
|
||||||
|
|
Loading…
Reference in New Issue