Add backwards incompatibility [ci skip]

This commit is contained in:
Ines Montani 2019-09-18 21:21:48 +02:00
parent 6ebdc5f7d2
commit f873548f6c
1 changed files with 30 additions and 1 deletions

View File

@ -326,4 +326,33 @@ check if all of your models are up to date, you can run the
</Infobox>
<!-- TODO: copy from release notes once they're ready -->
- The Dutch models have been trained on a new NER corpus (custom labelled UD
instead of WikiNER), so their predictions may be very different compared to
the previous version. The results should be significantly better and more
generalizable, though.
- The `spacy download` command does **not** set the `--no-deps` pip argument
anymore by default, meaning that model package dependencies (if available)
will now be also downloaded and installed. If spaCy (which is also a model
dependency) is not installed in the current environment, e.g. if a user has
built from source, `--no-deps` is added back automatically to prevent spaCy
from being downloaded and installed again from pip.
- The built-in `biluo_tags_from_offsets` converter is now stricter and will
raise an error if entities are overlapping (instead of silently skipping
them). If your data contains invalid entity annotations, make sure to clean it
and resolve conflicts. You can now also use the new `debug-data` command to
find problems in your data.
- The default punctuation in the `sentencizer` has been extended and now
includes more characters common in various languages. This also means that the
results it produces may change, depending on your text. If you want the
previous behaviour with limited characters, set `punct_chars=[".", "!", "?"]`
on initialization.
- Lemmatization tables (rules, exceptions, index and lookups) are now part of
the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
pipeline components, vocab) will now include additional data, and models
written to disk will include additional files.
- The `Serbian` language class (introduced in v2.1.8) incorrectly used the
language code `rs` instead of `sr`. This has now been fixed, so `Serbian` is
now available via `spacy.lang.sr`.
- The `"sources"` in the `meta.json` have changed from a list of strings to a
list of dicts. This is mostly internals, but if your code used
`nlp.meta["sources"]`, you might have to update it.