mirror of https://github.com/explosion/spaCy.git
several small updates
This commit is contained in:
parent
ad2332d4b7
commit
da48c6a2a2
|
@ -222,8 +222,8 @@ passed to the component factory as arguments. This lets you configure the model
|
||||||
settings and hyperparameters. If a component block defines a `source`, the
|
settings and hyperparameters. If a component block defines a `source`, the
|
||||||
component will be copied over from an existing pretrained model, with its
|
component will be copied over from an existing pretrained model, with its
|
||||||
existing weights. This lets you include an already trained component in your
|
existing weights. This lets you include an already trained component in your
|
||||||
model pipeline, or update a pretrained component with more data specific to
|
model pipeline, or update a pretrained component with more data specific to your
|
||||||
your use case.
|
use case.
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
### config.cfg (excerpt)
|
### config.cfg (excerpt)
|
||||||
|
@ -290,11 +290,11 @@ batch_size = 128
|
||||||
```
|
```
|
||||||
|
|
||||||
To refer to a function instead, you can make `[training.batch_size]` its own
|
To refer to a function instead, you can make `[training.batch_size]` its own
|
||||||
section and use the `@` syntax to specify the function and its arguments – in this
|
section and use the `@` syntax to specify the function and its arguments – in
|
||||||
case [`compounding.v1`](https://thinc.ai/docs/api-schedules#compounding) defined
|
this case [`compounding.v1`](https://thinc.ai/docs/api-schedules#compounding)
|
||||||
in the [function registry](/api/top-level#registry). All other values defined in
|
defined in the [function registry](/api/top-level#registry). All other values
|
||||||
the block are passed to the function as keyword arguments when it's initialized.
|
defined in the block are passed to the function as keyword arguments when it's
|
||||||
You can also use this mechanism to register
|
initialized. You can also use this mechanism to register
|
||||||
[custom implementations and architectures](#custom-functions) and reference them
|
[custom implementations and architectures](#custom-functions) and reference them
|
||||||
from your configs.
|
from your configs.
|
||||||
|
|
||||||
|
@ -722,9 +722,9 @@ a stream of items into a stream of batches. spaCy has several useful built-in
|
||||||
[batching strategies](/api/top-level#batchers) with customizable sizes, but it's
|
[batching strategies](/api/top-level#batchers) with customizable sizes, but it's
|
||||||
also easy to implement your own. For instance, the following function takes the
|
also easy to implement your own. For instance, the following function takes the
|
||||||
stream of generated [`Example`](/api/example) objects, and removes those which
|
stream of generated [`Example`](/api/example) objects, and removes those which
|
||||||
have the exact same underlying raw text, to avoid duplicates within each batch.
|
have the same underlying raw text, to avoid duplicates within each batch. Note
|
||||||
Note that in a more realistic implementation, you'd also want to check whether
|
that in a more realistic implementation, you'd also want to check whether the
|
||||||
the annotations are exactly the same.
|
annotations are the same.
|
||||||
|
|
||||||
> #### config.cfg
|
> #### config.cfg
|
||||||
>
|
>
|
||||||
|
@ -839,8 +839,8 @@ called the **gold standard**. It's initialized with a [`Doc`](/api/doc) object
|
||||||
that will hold the predictions, and another `Doc` object that holds the
|
that will hold the predictions, and another `Doc` object that holds the
|
||||||
gold-standard annotations. It also includes the **alignment** between those two
|
gold-standard annotations. It also includes the **alignment** between those two
|
||||||
documents if they differ in tokenization. The `Example` class ensures that spaCy
|
documents if they differ in tokenization. The `Example` class ensures that spaCy
|
||||||
can rely on one **standardized format** that's passed through the pipeline.
|
can rely on one **standardized format** that's passed through the pipeline. For
|
||||||
Here's an example of a simple `Example` for part-of-speech tags:
|
instance, let's say we want to define gold-standard part-of-speech tags:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
words = ["I", "like", "stuff"]
|
words = ["I", "like", "stuff"]
|
||||||
|
@ -852,9 +852,10 @@ reference = Doc(vocab, words=words).from_array("TAG", numpy.array(tag_ids, dtype
|
||||||
example = Example(predicted, reference)
|
example = Example(predicted, reference)
|
||||||
```
|
```
|
||||||
|
|
||||||
Alternatively, the `reference` `Doc` with the gold-standard annotations can be
|
As this is quite verbose, there's an alternative way to create the reference
|
||||||
created from a dictionary with keyword arguments specifying the annotations,
|
`Doc` with the gold-standard annotations. The function `Example.from_dict` takes
|
||||||
like `tags` or `entities`. Using the `Example` object and its gold-standard
|
a dictionary with keyword arguments specifying the annotations, like `tags` or
|
||||||
|
`entities`. Using the resulting `Example` object and its gold-standard
|
||||||
annotations, the model can be updated to learn a sentence of three words with
|
annotations, the model can be updated to learn a sentence of three words with
|
||||||
their assigned part-of-speech tags.
|
their assigned part-of-speech tags.
|
||||||
|
|
||||||
|
@ -879,7 +880,7 @@ example = Example.from_dict(predicted, {"tags": tags})
|
||||||
Here's another example that shows how to define gold-standard named entities.
|
Here's another example that shows how to define gold-standard named entities.
|
||||||
The letters added before the labels refer to the tags of the
|
The letters added before the labels refer to the tags of the
|
||||||
[BILUO scheme](/usage/linguistic-features#updating-biluo) – `O` is a token
|
[BILUO scheme](/usage/linguistic-features#updating-biluo) – `O` is a token
|
||||||
outside an entity, `U` an single entity unit, `B` the beginning of an entity,
|
outside an entity, `U` a single entity unit, `B` the beginning of an entity,
|
||||||
`I` a token inside an entity and `L` the last token of an entity.
|
`I` a token inside an entity and `L` the last token of an entity.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
@ -954,7 +955,7 @@ dictionary of annotations:
|
||||||
```diff
|
```diff
|
||||||
text = "Facebook released React in 2014"
|
text = "Facebook released React in 2014"
|
||||||
annotations = {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]}
|
annotations = {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]}
|
||||||
+ example = Example.from_dict(nlp.make_doc(text), {"entities": entities})
|
+ example = Example.from_dict(nlp.make_doc(text), annotations)
|
||||||
- nlp.update([text], [annotations])
|
- nlp.update([text], [annotations])
|
||||||
+ nlp.update([example])
|
+ nlp.update([example])
|
||||||
```
|
```
|
||||||
|
|
Loading…
Reference in New Issue