several small updates

This commit is contained in:
svlandeg 2020-08-21 18:25:26 +02:00
parent ad2332d4b7
commit da48c6a2a2
1 changed files with 18 additions and 17 deletions

View File

@ -222,8 +222,8 @@ passed to the component factory as arguments. This lets you configure the model
settings and hyperparameters. If a component block defines a `source`, the settings and hyperparameters. If a component block defines a `source`, the
component will be copied over from an existing pretrained model, with its component will be copied over from an existing pretrained model, with its
existing weights. This lets you include an already trained component in your existing weights. This lets you include an already trained component in your
model pipeline, or update a pretrained component with more data specific to model pipeline, or update a pretrained component with more data specific to your
your use case. use case.
```ini ```ini
### config.cfg (excerpt) ### config.cfg (excerpt)
@ -290,11 +290,11 @@ batch_size = 128
``` ```
To refer to a function instead, you can make `[training.batch_size]` its own To refer to a function instead, you can make `[training.batch_size]` its own
section and use the `@` syntax to specify the function and its arguments in this section and use the `@` syntax to specify the function and its arguments in
case [`compounding.v1`](https://thinc.ai/docs/api-schedules#compounding) defined this case [`compounding.v1`](https://thinc.ai/docs/api-schedules#compounding)
in the [function registry](/api/top-level#registry). All other values defined in defined in the [function registry](/api/top-level#registry). All other values
the block are passed to the function as keyword arguments when it's initialized. defined in the block are passed to the function as keyword arguments when it's
You can also use this mechanism to register initialized. You can also use this mechanism to register
[custom implementations and architectures](#custom-functions) and reference them [custom implementations and architectures](#custom-functions) and reference them
from your configs. from your configs.
@ -722,9 +722,9 @@ a stream of items into a stream of batches. spaCy has several useful built-in
[batching strategies](/api/top-level#batchers) with customizable sizes, but it's [batching strategies](/api/top-level#batchers) with customizable sizes, but it's
also easy to implement your own. For instance, the following function takes the also easy to implement your own. For instance, the following function takes the
stream of generated [`Example`](/api/example) objects, and removes those which stream of generated [`Example`](/api/example) objects, and removes those which
have the exact same underlying raw text, to avoid duplicates within each batch. have the same underlying raw text, to avoid duplicates within each batch. Note
Note that in a more realistic implementation, you'd also want to check whether that in a more realistic implementation, you'd also want to check whether the
the annotations are exactly the same. annotations are the same.
> #### config.cfg > #### config.cfg
> >
@ -839,8 +839,8 @@ called the **gold standard**. It's initialized with a [`Doc`](/api/doc) object
that will hold the predictions, and another `Doc` object that holds the that will hold the predictions, and another `Doc` object that holds the
gold-standard annotations. It also includes the **alignment** between those two gold-standard annotations. It also includes the **alignment** between those two
documents if they differ in tokenization. The `Example` class ensures that spaCy documents if they differ in tokenization. The `Example` class ensures that spaCy
can rely on one **standardized format** that's passed through the pipeline. can rely on one **standardized format** that's passed through the pipeline. For
Here's an example of a simple `Example` for part-of-speech tags: instance, let's say we want to define gold-standard part-of-speech tags:
```python ```python
words = ["I", "like", "stuff"] words = ["I", "like", "stuff"]
@ -852,9 +852,10 @@ reference = Doc(vocab, words=words).from_array("TAG", numpy.array(tag_ids, dtype
example = Example(predicted, reference) example = Example(predicted, reference)
``` ```
Alternatively, the `reference` `Doc` with the gold-standard annotations can be As this is quite verbose, there's an alternative way to create the reference
created from a dictionary with keyword arguments specifying the annotations, `Doc` with the gold-standard annotations. The function `Example.from_dict` takes
like `tags` or `entities`. Using the `Example` object and its gold-standard a dictionary with keyword arguments specifying the annotations, like `tags` or
`entities`. Using the resulting `Example` object and its gold-standard
annotations, the model can be updated to learn a sentence of three words with annotations, the model can be updated to learn a sentence of three words with
their assigned part-of-speech tags. their assigned part-of-speech tags.
@ -879,7 +880,7 @@ example = Example.from_dict(predicted, {"tags": tags})
Here's another example that shows how to define gold-standard named entities. Here's another example that shows how to define gold-standard named entities.
The letters added before the labels refer to the tags of the The letters added before the labels refer to the tags of the
[BILUO scheme](/usage/linguistic-features#updating-biluo) `O` is a token [BILUO scheme](/usage/linguistic-features#updating-biluo) `O` is a token
outside an entity, `U` an single entity unit, `B` the beginning of an entity, outside an entity, `U` a single entity unit, `B` the beginning of an entity,
`I` a token inside an entity and `L` the last token of an entity. `I` a token inside an entity and `L` the last token of an entity.
```python ```python
@ -954,7 +955,7 @@ dictionary of annotations:
```diff ```diff
text = "Facebook released React in 2014" text = "Facebook released React in 2014"
annotations = {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]} annotations = {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]}
+ example = Example.from_dict(nlp.make_doc(text), {"entities": entities}) + example = Example.from_dict(nlp.make_doc(text), annotations)
- nlp.update([text], [annotations]) - nlp.update([text], [annotations])
+ nlp.update([example]) + nlp.update([example])
``` ```