| `width` | int | The width of the input and output. These are required to be the same, so that residual connections can be used. Recommended values are `96`, `128` or `300`. |
| `depth` | int | The number of convolutional layers to use. Recommended values are between `2` and `8`. |
| `embed_size` | int | The number of rows in the hash embedding tables. This can be surprisingly small, due to the use of the hash embeddings. Recommended values are between `2000` and `10000`. |
| `window_size` | int | The number of tokens on either side to concatenate during the convolutions. The receptive field of the CNN will be `depth * (window_size * 2 + 1)`, so a 4-layer network with a window size of `2` will be sensitive to 17 words at a time. Recommended value is `1`. |
| `maxout_pieces` | int | The number of pieces to use in the maxout non-linearity. If `1`, the [`Mish`](https://thinc.ai/docs/api-layers#mish) non-linearity is used instead. Recommended values are `1`-`3`. |
| `subword_features` | bool | Whether to also embed subword features, specifically the prefix, suffix and word shape. This is recommended for alphabetic languages like English, but not if single-character tokens are used for a language such as Chinese. |
| `pretrained_vectors` | bool | Whether to also use static vectors. |
| `embed` | [`Model`](https://thinc.ai/docs/api-model) | **Input:**`List[Doc]`. **Output:**`List[Floats2d]`. Embed tokens into context-independent word vector representations. |
| `encode` | [`Model`](https://thinc.ai/docs/api-model) | **Input:**`List[Floats2d]`. **Output:**`List[Floats2d]`. Encode context into the embeddings, using an architecture such as a CNN, BiLSTM or transformer. |
### spacy.MultiHashEmbed.v1 {#MultiHashEmbed}
<!-- TODO: check example config -->
> #### Example config
>
> ```ini
> [model]
> @architectures = "spacy.MultiHashEmbed.v1"
> width = 64
> rows = 2000
> also_embed_subwords = false
> also_use_static_vectors = false
> ```
Construct an embedding layer that separately embeds a number of lexical
attributes using hash embedding, concatenates the results, and passes it through
a feed-forward subnetwork to build a mixed representations. The features used
are the `NORM`, `PREFIX`, `SUFFIX` and `SHAPE`, which can have varying
definitions depending on the `Vocab` of the `Doc` object passed in. Vectors from
pretrained static vectors can also be incorporated into the concatenated
| `width` | int | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. |
| `rows` | int | The number of rows for the embedding tables. Can be low, due to the hashing trick. Embeddings for prefix, suffix and word shape use half as many rows. Recommended values are between `2000` and `10000`. |
| `also_embed_subwords` | bool | Whether to use the `PREFIX`, `SUFFIX` and `SHAPE` features in the embeddings. If not using these, you may need more rows in your hash embeddings, as there will be increased chance of collisions. |
| `also_use_static_vectors` | bool | Whether to also use static word vectors. Requires a vectors table to be loaded in the [Doc](/api/doc) objects' vocab. |
### spacy.CharacterEmbed.v1 {#CharacterEmbed}
<!-- TODO: check example config -->
> #### Example config
>
> ```ini
> [model]
> @architectures = "spacy.CharacterEmbed.v1"
> width = 64
> rows = 2000
> nM = 16
> nC = 4
> ```
Construct an embedded representations based on character embeddings, using a
feed-forward network. A fixed number of UTF-8 byte characters are used for each
word, taken from the beginning and end of the word equally. Padding is used in
the center for words that are too short.
For instance, let's say `nC=4`, and the word is "jumping". The characters used
will be `"jung"` (two from the start, two from the end). If we had `nC=8`, the
characters would be `"jumpping"`: 4 from the start, 4 from the end. This ensures
that the final character is always in the last position, instead of being in an
arbitrary position depending on the word length.
The characters are embedded in a embedding table with 256 rows, and the vectors
concatenated. A hash-embedded vector of the `NORM` of the word is also
concatenated on, and the result is then passed through a feed-forward network to
construct a single vector to represent the information.
| `width` | int | The width of the output vector and the `NORM` hash embedding. |
| `rows` | int | The number of rows in the `NORM` hash embedding table. |
| `nM` | int | The dimensionality of the character embeddings. Recommended values are between `16` and `64`. |
| `nC` | int | The number of UTF-8 bytes to embed per word. Recommended values are between `3` and `8`, although it may depend on the length of words in the language. |
| `width` | int | The input and output width. These are required to be the same, to allow residual connections. This value will be determined by the width of the inputs. Recommended values are between `64` and `300`. |
| `window_size` | int | The number of words to concatenate around each token to construct the convolution. Recommended value is `1`. |
| `maxout_pieces` | int | The number of maxout pieces to use. Recommended values are `2` or `3`. |
| `depth` | int | The number of convolutional layers. Recommended value is `4`. |
| `width` | int | The input and output width. These are required to be the same, to allow residual connections. This value will be determined by the width of the inputs. Recommended values are between `64` and `300`. |
| `window_size` | int | The number of words to concatenate around each token to construct the convolution. Recommended value is `1`. |
| `depth` | int | The number of convolutional layers. Recommended value is `4`. |
| `width` | int | The input and output width. These are required to be the same, to allow residual connections. This value will be determined by the width of the inputs. Recommended values are between `64` and `300`. |
| `window_size` | int | The number of words to concatenate around each token to construct the convolution. Recommended value is `1`. |
| `depth` | int | The number of convolutional layers. Recommended value is `4`. |
| `name` | str | Any model name that can be loaded by [`transformers.AutoModel`](https://huggingface.co/transformers/model_doc/auto.html#transformers.AutoModel). |
| `get_spans` | `Callable` | Function that takes a batch of [`Doc`](/api/doc) object and returns lists of [`Span`](/api) objects to process by the transformer. [See here](/api/transformer#span_getters) for built-in options and examples. |
| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | **Input:**`List[Doc]`. **Output:**`List[Floats2d]`. Subnetwork to map tokens into vector representations. |
| `nr_feature_tokens` | int | The number of tokens in the context to use to construct the state vector. Valid choices are `1`, `2`, `3`, `6`, `8` and `13`. The `2`, `8` and `13` feature sets are designed for the parser, while the `3` and `6` feature sets are designed for the entity recognizer. The recommended feature sets are `3` for NER, and `8` for the dependency parser. |
| `hidden_width` | int | The width of the hidden layer. |
| `maxout_pieces` | int | How many pieces to use in the state prediction layer. Recommended values are `1`, `2` or `3`. If `1`, the maxout non-linearity is replaced with a [`Relu`](https://thinc.ai/docs/api-layers#relu) non-linearity if `use_upper` is `True`, and no non-linearity if `False`. |
| `use_upper` | bool | Whether to use an additional hidden layer after the state vector in order to predict the action scores. It is recommended to set this to `False` for large pretrained models such as transformers, and `True` for smaller networks. The upper layer is computed on CPU, which becomes a bottleneck on larger GPU-based models, where it's also less necessary. |
| `nO` | int | The number of actions the model will predict between. Usually inferred from data at the beginning of training, or loaded from disk. |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `pretrained_vectors` | bool | Whether or not pretrained vectors will be used in addition to the feature vectors. |
| `width` | int | Output dimension of the feature encoding step. |
| `embed_size` | int | Input dimension of the feature encoding step. |
| `conv_depth` | int | Depth of the Tok2Vec layer. |
| `window_size` | int | The number of contextual vectors to [concatenate](https://thinc.ai/docs/api-layers#expand_window) from the left and from the right. |
| `ngram_size` | int | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. |
| `dropout` | float | The dropout rate. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the the [`TextCategorizer`](/api/textcategorizer) component will set it when |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | The [`tok2vec`](#tok2vec) layer of the model. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the the [`TextCategorizer`](/api/textcategorizer) component will set it when |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `ngram_size` | int | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. |
| `no_output_layer` | float | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes=True`, else `Logistic`. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the the [`TextCategorizer`](/api/textcategorizer) component will set it when |