Add infobox

This commit is contained in:
Ines Montani 2019-07-17 15:29:36 +02:00
parent 114cb18892
commit 1d5ff3e455
1 changed files with 9 additions and 0 deletions

View File

@ -1019,6 +1019,15 @@ above:
- The dictionary `b2a_multi` shows that there are no tokens in `spacy_tokens` - The dictionary `b2a_multi` shows that there are no tokens in `spacy_tokens`
that map to multiple tokens in `other_tokens`. that map to multiple tokens in `other_tokens`.
<Infobox title="Important note" variant="warning">
The current implementation of the alignment algorithm assumes that both
tokenizations add up to the same string. For example, you'll be able to align
`["I", "'", "m"]` and `["I", "'m"]`, which both add up to `"I'm"`, but not
`["I", "'m"]` and `["I", "am"]`.
</Infobox>
## Merging and splitting {#retokenization new="2.1"} ## Merging and splitting {#retokenization new="2.1"}
The [`Doc.retokenize`](/api/doc#retokenize) context manager lets you merge and The [`Doc.retokenize`](/api/doc#retokenize) context manager lets you merge and