diff --git a/website/docs/usage/v2-2.md b/website/docs/usage/v2-2.md index 6c2d3c158..e3907f8ea 100644 --- a/website/docs/usage/v2-2.md +++ b/website/docs/usage/v2-2.md @@ -341,6 +341,11 @@ check if all of your models are up to date, you can run the them). If your data contains invalid entity annotations, make sure to clean it and resolve conflicts. You can now also use the new `debug-data` command to find problems in your data. +- Pipeline components can now overwrite IOB tags of tokens that are not yet part + of an entity. Once a token has an `ent_iob` value set, it won't be reset to an + "unset" state and will always have at least `O` assigned. `list(doc.ents)` now + actually keeps the annotations on the token level consistent, instead of + resetting `O` to an empty string. - The default punctuation in the `sentencizer` has been extended and now includes more characters common in various languages. This also means that the results it produces may change, depending on your text. If you want the