mirror of https://github.com/explosion/spaCy.git
Correct alignment example and documentation (#11491)
* Correct example and documentation * Added altered example.md * Changes based on review + apply prettier * Remote unnecessary 'the' Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
This commit is contained in:
parent
6be6913ba5
commit
3f0c3ad7d3
|
@ -286,10 +286,14 @@ Calculate alignment tables between two tokenizations.
|
|||
|
||||
### Alignment attributes {#alignment-attributes"}
|
||||
|
||||
| Name | Description |
|
||||
| ----- | --------------------------------------------------------------------- |
|
||||
| `x2y` | The `Ragged` object holding the alignment from `x` to `y`. ~~Ragged~~ |
|
||||
| `y2x` | The `Ragged` object holding the alignment from `y` to `x`. ~~Ragged~~ |
|
||||
Alignment attributes are managed using `AlignmentArray`, which is a
|
||||
simplified version of Thinc's [Ragged](https://thinc.ai/docs/api-types#ragged)
|
||||
type that only supports the `data` and `length` attributes.
|
||||
|
||||
| Name | Description |
|
||||
| ----- | ------------------------------------------------------------------------------------- |
|
||||
| `x2y` | The `AlignmentArray` object holding the alignment from `x` to `y`. ~~AlignmentArray~~ |
|
||||
| `y2x` | The `AlignmentArray` object holding the alignment from `y` to `x`. ~~AlignmentArray~~ |
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
||||
|
@ -309,10 +313,10 @@ tokenizations add up to the same string. For example, you'll be able to align
|
|||
> spacy_tokens = ["obama", "'s", "podcast"]
|
||||
> alignment = Alignment.from_strings(bert_tokens, spacy_tokens)
|
||||
> a2b = alignment.x2y
|
||||
> assert list(a2b.dataXd) == [0, 1, 1, 2]
|
||||
> assert list(a2b.data) == [0, 1, 1, 2]
|
||||
> ```
|
||||
>
|
||||
> If `a2b.dataXd[1] == a2b.dataXd[2] == 1`, that means that `A[1]` (`"'"`) and
|
||||
> If `a2b.data[1] == a2b.data[2] == 1`, that means that `A[1]` (`"'"`) and
|
||||
> `A[2]` (`"s"`) both align to `B[1]` (`"'s"`).
|
||||
|
||||
### Alignment.from_strings {#classmethod tag="function"}
|
||||
|
|
|
@ -1422,9 +1422,9 @@ other_tokens = ["i", "listened", "to", "obama", "'", "s", "podcasts", "."]
|
|||
spacy_tokens = ["i", "listened", "to", "obama", "'s", "podcasts", "."]
|
||||
align = Alignment.from_strings(other_tokens, spacy_tokens)
|
||||
print(f"a -> b, lengths: {align.x2y.lengths}") # array([1, 1, 1, 1, 1, 1, 1, 1])
|
||||
print(f"a -> b, mapping: {align.x2y.dataXd}") # array([0, 1, 2, 3, 4, 4, 5, 6]) : two tokens both refer to "'s"
|
||||
print(f"a -> b, mapping: {align.x2y.data}") # array([0, 1, 2, 3, 4, 4, 5, 6]) : two tokens both refer to "'s"
|
||||
print(f"b -> a, lengths: {align.y2x.lengths}") # array([1, 1, 1, 1, 2, 1, 1]) : the token "'s" refers to two tokens
|
||||
print(f"b -> a, mappings: {align.y2x.dataXd}") # array([0, 1, 2, 3, 4, 5, 6, 7])
|
||||
print(f"b -> a, mappings: {align.y2x.data}") # array([0, 1, 2, 3, 4, 5, 6, 7])
|
||||
```
|
||||
|
||||
Here are some insights from the alignment information generated in the example
|
||||
|
@ -1433,10 +1433,10 @@ above:
|
|||
- The one-to-one mappings for the first four tokens are identical, which means
|
||||
they map to each other. This makes sense because they're also identical in the
|
||||
input: `"i"`, `"listened"`, `"to"` and `"obama"`.
|
||||
- The value of `x2y.dataXd[6]` is `5`, which means that `other_tokens[6]`
|
||||
- The value of `x2y.data[6]` is `5`, which means that `other_tokens[6]`
|
||||
(`"podcasts"`) aligns to `spacy_tokens[5]` (also `"podcasts"`).
|
||||
- `x2y.dataXd[4]` and `x2y.dataXd[5]` are both `4`, which means that both tokens
|
||||
4 and 5 of `other_tokens` (`"'"` and `"s"`) align to token 4 of `spacy_tokens`
|
||||
- `x2y.data[4]` and `x2y.data[5]` are both `4`, which means that both tokens 4
|
||||
and 5 of `other_tokens` (`"'"` and `"s"`) align to token 4 of `spacy_tokens`
|
||||
(`"'s"`).
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
|
Loading…
Reference in New Issue