5.3 KiB
title | teaser | tag | source | new | api_base_class | api_string_name |
---|---|---|---|---|---|---|
Transformer | Pipeline component for multi-task learning with transformer models | class | github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py | 3 | /api/pipe | transformer |
Installation
$ pip install spacy-transformers
This component is available via the extension package
spacy-transformers
. It
exposes the component via entry points, so if you have the package installed,
using factory = "transformer"
in your
training config or nlp.add_pipe("transformer")
will
work out-of-the-box.
This pipeline component lets you use transformer models in your pipeline. The
component assigns the output of the transformer to the Doc's extension
attributes. We also calculate an alignment between the word-piece tokens and the
spaCy tokenization, so that we can use the last hidden states to set the
Doc.tensor
attribute. When multiple word-piece tokens align to the same spaCy
token, the spaCy token receives the sum of their values. To access the values,
you can use the custom Doc._.trf_data
attribute. For
more details, see the usage documentation.
Config and implementation
The default config is defined by the pipeline component factory and describes
how the component should be configured. You can override its settings via the
config
argument on nlp.add_pipe
or in your
config.cfg
for training. See the
model architectures documentation for details on the
architectures and their arguments and hyperparameters.
Example
from spacy_transformers import Transformer, DEFAULT_CONFIG nlp.add_pipe("transformer", config=DEFAULT_CONFIG)
Setting | Type | Description | Default |
---|---|---|---|
max_batch_items |
int | Maximum size of a padded batch. | 4096 |
annotation_setter |
Callable | null_annotation_setter |
|
model |
Model |
The model to use. | TransformerModel |
https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
Transformer.__init__
Example
# Construction via add_pipe with default model trf = nlp.add_pipe("transformer") # Construction via add_pipe with custom model config = {"model": {"@architectures": "my_transformer"}} trf = nlp.add_pipe("transformer", config=config) # Construction from class from spacy_transformers import Transformer trf = Transformer(nlp.vocab, model)
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
nlp.add_pipe
.
Name | Type | Description |
---|---|---|
vocab |
Vocab |
The shared vocabulary. |
model |
Model |
The Thinc Model powering the pipeline component. |
annotation_setter |
Callable |
|
keyword-only | ||
name |
str | String name of the component instance. Used to add entries to the losses during training. |
max_batch_items |
int | Maximum size of a padded batch. Defaults to 128*32 . |
TransformerData
FullTransformerBatch
Custom attributes
The component sets the following custom extension attributes:
Name | Type | Description |
---|---|---|
Doc.trf_data |
TransformerData |