spaCy/architectures.md at 74cb6d39d0d1f00af10a9b521aec36206baf457f

5.3 KiB

Raw Blame History

title

teaser

Layers and Model Architectures

Power spaCy components with custom neural networks

Type Signatures

type-sigs

Defining Sublayers

sublayers

PyTorch & TensorFlow

frameworks

Trainable Components

components

A model architecture is a function that wires up a Thinc Model instance, which you can then use in a component or as a layer of a larger network. You can use Thinc as a thin wrapper around frameworks such as PyTorch, TensorFlow or MXNet, or you can implement your logic in Thinc directly. spaCy's built-in components will never construct their Model instances themselves, so you won't have to subclass the component to change its model architecture. You can just update the config so that it refers to a different registered function. Once the component has been created, its model instance has already been assigned, so you cannot change its model architecture. The architecture is like a recipe for the network, and you can't change the recipe once the dish has already been prepared. You have to make a new one.

Type signatures

The Thinc Model class is a generic type that can specify its input and output types. Python uses a square-bracket notation for this, so the type ~~Model[List, Dict]~~ says that each batch of inputs to the model will be a list, and the outputs will be a dictionary. Both typing.List and typing.Dict are also generics, allowing you to be more specific about the data. For instance, you can write ~~Model[List[Doc], Dict[str, float]]~~ to specify that the model expects a list of Doc objects as input, and returns a dictionary mapping strings to floats. Some of the most common types you'll see are:

Type	Description
~~List[Doc]~~	A batch of `Doc` objects. Most components expect their models to take this as input.
~~Floats2d~~	A two-dimensional `numpy` or `cupy` array of floats. Usually 32-bit.
~~Ints2d~~	A two-dimensional `numpy` or `cupy` array of integers. Common dtypes include uint64, int32 and int8.
~~List[Floats2d]~~	A list of two-dimensional arrays, generally with one array per `Doc` and one row per token.
~~Ragged~~	A container to handle variable-length sequence data in an unpadded contiguous array.
~~Padded~~	A container to handle variable-length sequence data in a passed contiguous array.

The model type-signatures help you figure out which model architectures and components can fit together. For instance, the TextCategorizer class expects a model typed ~~Model[List[Doc], Floats2d]~~, because the model will predict one row of category probabilities per Doc. In contrast, the Tagger class expects a model typed ~~Model[List[Doc], List[Floats2d]]~~, because it needs to predict one row of probabilities per token. There's no guarantee that two models with the same type-signature can be used interchangeably. There are many other ways they could be incompatible. However, if the types don't match, they almost surely won't be compatible. This little bit of validation goes a long way, especially if you configure your editor or other tools to highlight these errors early. Thinc will also verify that your types match correctly when your config file is processed at the beginning of training.

Defining sublayers

Model architecture functions often accept sublayers as arguments, so that you can try substituting a different layer into the network. Depending on how the architecture function is structured, you might be able to define your network structure entirely through the config system, using layers that have already been defined. The transformers documentation section shows a common example of swapping in a different sublayer. In most NLP neural network models, the most important parts of the network are what we refer to as the embed and encode steps. These steps together compute dense, context-sensitive representations of the tokens. Most of spaCy's default architectures accept a tok2vec layer as an argument, so you can control this important part of the network separately. This makes it easy to switch between transformer, CNN, BiLSTM or other feature extraction approaches. And if you want to define your own solution, all you need to do is register a ~~Model[List[Doc], List[Floats2d]]~~ architecture function, and you'll be able to try it out in any of spaCy components.

Registering new architectures

Recap concept, link to config docs.

Wrapping PyTorch, TensorFlow and other frameworks

Explain concept
Link off to notebook

Models for trainable components

Interaction with predict, get_loss and set_annotations
Initialization life-cycle with begin_training.
Link to relation extraction notebook.

5.3 KiB Raw Blame History Unescape Escape