Proofreading

Finished with the API docs and started on the Usage, but Embedding & Transformers
This commit is contained in:
walterhenry 2020-09-29 12:39:10 +02:00
parent c1c841940c
commit 1d80b3dc1b
1 changed files with 4 additions and 4 deletions

View File

@ -41,8 +41,8 @@ transformers is that word vectors model **lexical types**, rather than _tokens_.
If you have a list of terms with no context around them, a transformer model If you have a list of terms with no context around them, a transformer model
like BERT can't really help you. BERT is designed to understand language **in like BERT can't really help you. BERT is designed to understand language **in
context**, which isn't what you have. A word vectors table will be a much better context**, which isn't what you have. A word vectors table will be a much better
fit for your task. However, if you do have words in context whole sentences or fit for your task. However, if you do have words in context whole sentences or
paragraphs of running text word vectors will only provide a very rough paragraphs of running text word vectors will only provide a very rough
approximation of what the text is about. approximation of what the text is about.
Word vectors are also very computationally efficient, as they map a word to a Word vectors are also very computationally efficient, as they map a word to a
@ -256,7 +256,7 @@ for doc in nlp.pipe(["some text", "some other text"]):
``` ```
You can also customize how the [`Transformer`](/api/transformer) component sets You can also customize how the [`Transformer`](/api/transformer) component sets
annotations onto the [`Doc`](/api/doc), by specifying a custom annotations onto the [`Doc`](/api/doc) by specifying a custom
`set_extra_annotations` function. This callback will be called with the raw `set_extra_annotations` function. This callback will be called with the raw
input and output data for the whole batch, along with the batch of `Doc` input and output data for the whole batch, along with the batch of `Doc`
objects, allowing you to implement whatever you need. The annotation setter is objects, allowing you to implement whatever you need. The annotation setter is
@ -675,7 +675,7 @@ given you a 10% error reduction, pretraining with spaCy might give you another
The [`spacy pretrain`](/api/cli#pretrain) command will take a **specific The [`spacy pretrain`](/api/cli#pretrain) command will take a **specific
subnetwork** within one of your components, and add additional layers to build a subnetwork** within one of your components, and add additional layers to build a
network for a temporary task, that forces the model to learn something about network for a temporary task that forces the model to learn something about
sentence structure and word cooccurrence statistics. Pretraining produces a sentence structure and word cooccurrence statistics. Pretraining produces a
**binary weights file** that can be loaded back in at the start of training. The **binary weights file** that can be loaded back in at the start of training. The
weights file specifies an initial set of weights. Training then proceeds as weights file specifies an initial set of weights. Training then proceeds as