Adjust wording [ci skip]

This commit is contained in:
Ines Montani 2019-07-17 16:06:25 +02:00
parent 57d7076a72
commit c3ead02ea5
1 changed files with 4 additions and 3 deletions

View File

@ -970,9 +970,10 @@ optimized for compatibility with treebank annotations. Other tools and resources
can sometimes tokenize things differently for example, `"I'm"` can sometimes tokenize things differently for example, `"I'm"`
`["I", "'", "m"]` instead of `["I", "'m"]`. `["I", "'", "m"]` instead of `["I", "'m"]`.
In cases like that, you often want to align the tokenization so that you can In situations like that, you often want to align the tokenization so that you
merge annotations from different sources together, or take vectors predicted by can merge annotations from different sources together, or take vectors predicted
a [pre-trained BERT model](https://github.com/huggingface/pytorch-transformers) by a
[pre-trained BERT model](https://github.com/huggingface/pytorch-transformers)
and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align) and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align)
helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the
number of misaligned tokens, the one-to-one mappings of token indices in both number of misaligned tokens, the one-to-one mappings of token indices in both