mirror of https://github.com/explosion/spaCy.git
Proofreading
Finished with the API docs and started on the Usage, but Embedding & Transformers
This commit is contained in:
parent
c1c841940c
commit
1d80b3dc1b
|
@ -41,8 +41,8 @@ transformers is that word vectors model **lexical types**, rather than _tokens_.
|
||||||
If you have a list of terms with no context around them, a transformer model
|
If you have a list of terms with no context around them, a transformer model
|
||||||
like BERT can't really help you. BERT is designed to understand language **in
|
like BERT can't really help you. BERT is designed to understand language **in
|
||||||
context**, which isn't what you have. A word vectors table will be a much better
|
context**, which isn't what you have. A word vectors table will be a much better
|
||||||
fit for your task. However, if you do have words in context — whole sentences or
|
fit for your task. However, if you do have words in context – whole sentences or
|
||||||
paragraphs of running text — word vectors will only provide a very rough
|
paragraphs of running text – word vectors will only provide a very rough
|
||||||
approximation of what the text is about.
|
approximation of what the text is about.
|
||||||
|
|
||||||
Word vectors are also very computationally efficient, as they map a word to a
|
Word vectors are also very computationally efficient, as they map a word to a
|
||||||
|
@ -256,7 +256,7 @@ for doc in nlp.pipe(["some text", "some other text"]):
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also customize how the [`Transformer`](/api/transformer) component sets
|
You can also customize how the [`Transformer`](/api/transformer) component sets
|
||||||
annotations onto the [`Doc`](/api/doc), by specifying a custom
|
annotations onto the [`Doc`](/api/doc) by specifying a custom
|
||||||
`set_extra_annotations` function. This callback will be called with the raw
|
`set_extra_annotations` function. This callback will be called with the raw
|
||||||
input and output data for the whole batch, along with the batch of `Doc`
|
input and output data for the whole batch, along with the batch of `Doc`
|
||||||
objects, allowing you to implement whatever you need. The annotation setter is
|
objects, allowing you to implement whatever you need. The annotation setter is
|
||||||
|
@ -675,7 +675,7 @@ given you a 10% error reduction, pretraining with spaCy might give you another
|
||||||
|
|
||||||
The [`spacy pretrain`](/api/cli#pretrain) command will take a **specific
|
The [`spacy pretrain`](/api/cli#pretrain) command will take a **specific
|
||||||
subnetwork** within one of your components, and add additional layers to build a
|
subnetwork** within one of your components, and add additional layers to build a
|
||||||
network for a temporary task, that forces the model to learn something about
|
network for a temporary task that forces the model to learn something about
|
||||||
sentence structure and word cooccurrence statistics. Pretraining produces a
|
sentence structure and word cooccurrence statistics. Pretraining produces a
|
||||||
**binary weights file** that can be loaded back in at the start of training. The
|
**binary weights file** that can be loaded back in at the start of training. The
|
||||||
weights file specifies an initial set of weights. Training then proceeds as
|
weights file specifies an initial set of weights. Training then proceeds as
|
||||||
|
|
Loading…
Reference in New Issue