spaCy/transformer.md at 97d36515747640b11e8447a6177ab867353b0915

5.3 KiB

Raw Blame History

title	teaser	tag	source	new	api_base_class	api_string_name
Transformer	Pipeline component for multi-task learning with transformer models	class	github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py	3	/api/pipe	transformer

Installation

$ pip install spacy-transformers

This component is available via the extension package spacy-transformers. It exposes the component via entry points, so if you have the package installed, using factory = "transformer" in your training config or nlp.add_pipe("transformer") will work out-of-the-box.

This pipeline component lets you use transformer models in your pipeline. The component assigns the output of the transformer to the Doc's extension attributes. We also calculate an alignment between the word-piece tokens and the spaCy tokenization, so that we can use the last hidden states to set the Doc.tensor attribute. When multiple word-piece tokens align to the same spaCy token, the spaCy token receives the sum of their values. To access the values, you can use the custom Doc._.trf_data attribute. For more details, see the usage documentation.

Config and implementation

The default config is defined by the pipeline component factory and describes how the component should be configured. You can override its settings via the config argument on nlp.add_pipe or in your config.cfg for training. See the model architectures documentation for details on the architectures and their arguments and hyperparameters.

Example

from spacy_transformers import Transformer, DEFAULT_CONFIG

nlp.add_pipe("transformer", config=DEFAULT_CONFIG)

Setting	Type	Description	Default
`max_batch_items`	int	Maximum size of a padded batch.	`4096`
`annotation_setter`	Callable		`null_annotation_setter`
`model`	`Model`	The model to use.	TransformerModel

https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py

Transformer.init

Example

# Construction via add_pipe with default model
trf = nlp.add_pipe("transformer")

# Construction via add_pipe with custom model
config = {"model": {"@architectures": "my_transformer"}}
trf = nlp.add_pipe("transformer", config=config)

# Construction from class
from spacy_transformers import Transformer
trf = Transformer(nlp.vocab, model)

Create a new pipeline instance. In your application, you would normally use a shortcut for this and instantiate the component using its string name and nlp.add_pipe.

Name	Type	Description
`vocab`	`Vocab`	The shared vocabulary.
`model`	`Model`	The Thinc `Model` powering the pipeline component.
`annotation_setter`	`Callable`
keyword-only
`name`	str	String name of the component instance. Used to add entries to the `losses` during training.
`max_batch_items`	int	Maximum size of a padded batch. Defaults to `128*32`.

TransformerData

FullTransformerBatch

Custom attributes

The component sets the following custom extension attributes:

Name	Type	Description
`Doc.trf_data`	`TransformerData`

5.3 KiB Raw Blame History

Installation

Config and implementation

Example

Transformer.__init__

Example

TransformerData

FullTransformerBatch

Custom attributes

5.3 KiB

Raw Blame History

Transformer.init