spaCy/website/docs/usage/v3.md

3.8 KiB
Raw Blame History

title teaser menu
What's New in v3.0 New features, backwards incompatibilities and migration guide
Summary
summary
New Features
features
Backwards Incompatibilities
incompat
Migrating from v2.x
migrating
Migrating plugins
plugins

Summary

New Features

Backwards Incompatibilities

Removed deprecated methods, attributes and arguments

The following deprecated methods, attributes and arguments were removed in v3.0. Most of them have been deprecated for quite a while now and many would previously raise errors. Many of them were also mostly internals. If you've been working with more recent versions of spaCy v2.x, it's unlikely that your code relied on them.

Class Removed
Doc Doc.tokens_from_list, Doc.merge
Span Span.merge, Span.upper, Span.lower, Span.string
Token Token.string

Migrating from v2.x

Migration notes for plugin maintainers

Thanks to everyone who's been contributing to the spaCy ecosystem by developing and maintaining one of the many awesome plugins and extensions. We've tried to keep breaking changes to a minimum and make it as easy as possible for you to upgrade your packages for spaCy v3.

Custom pipeline components

The most common use case for plugins is providing pipeline components and extension attributes.

  • Use the @Language.factory decorator to register your component and assign it a name. This allows users to refer to your components by name and serialize pipelines referencing them. Remove all manual entries to the Language.factories.
  • Make sure your component factories take at least two named arguments: nlp (the current nlp object) and name (the instance name of the added component so you can identify multiple instances of the same component).
  • Update all references to nlp.add_pipe in your docs to use string names instead of the component functions.
### {highlight="1-5"}
from spacy.language import Language

@Language.factory("my_component", default_config={"some_setting": False})
def create_component(nlp: Language, name: str, some_setting: bool):
    return MyCoolComponent(some_setting=some_setting)


class MyCoolComponent:
    def __init__(self, some_setting):
        self.some_setting = some_setting

    def __call__(self, doc):
        # Do something to the doc
        return doc

Result in config.cfg

[components.my_component]
factory = "my_component"
some_setting = true
import spacy
from your_plugin import MyCoolComponent

nlp = spacy.load("en_core_web_sm")
- component = MyCoolComponent(some_setting=True)
- nlp.add_pipe(component)
+ nlp.add_pipe("my_component", config={"some_setting": True})

The @Language.factory decorator takes care of letting spaCy know that a component of that name is available. This means that your users can add it to the pipeline using its string name. However, this requires the decorator to be executed so users will still have to import your plugin. Alternatively, your plugin could expose an entry point, which spaCy can read from. This means that spaCy knows how to initialize my_component, even if your package isn't imported.