spaCy/website/api/_top-level/_spacy.jade

192 lines
6.1 KiB
Plaintext

//- 💫 DOCS > API > TOP-LEVEL > SPACY
+h(3, "spacy.load") spacy.load
+tag function
+tag-model
p
| Load a model via its #[+a("/usage/models#usage") shortcut link],
| the name of an installed
| #[+a("/usage/training#models-generating") model package], a unicode
| path or a #[code Path]-like object. spaCy will try resolving the load
| argument in this order. If a model is loaded from a shortcut link or
| package name, spaCy will assume it's a Python package and import it and
| call the model's own #[code load()] method. If a model is loaded from a
| path, spaCy will assume it's a data directory, read the language and
| pipeline settings off the meta.json and initialise the #[code Language]
| class. The data will be loaded in via
| #[+api("language#from_disk") #[code Language.from_disk()]].
+aside-code("Example").
nlp = spacy.load('en') # shortcut link
nlp = spacy.load('en_core_web_sm') # package
nlp = spacy.load('/path/to/en') # unicode path
nlp = spacy.load(Path('/path/to/en')) # pathlib Path
nlp = spacy.load('en', disable=['parser', 'tagger'])
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode or #[code Path]
+cell Model to load, i.e. shortcut link, package name or path.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Language]
+cell A #[code Language] object with the loaded model.
p
| Essentially, #[code spacy.load()] is a convenience wrapper that reads
| the language ID and pipeline components from a model's #[code meta.json],
| initialises the #[code Language] class, loads in the model data and
| returns it.
+code("Abstract example").
cls = util.get_lang_class(lang) # get Language class for ID, e.g. 'en'
nlp = cls() # initialise the Language class
for name in pipeline:
component = nlp.create_pipe(name) # create each pipeline component
nlp.add_pipe(component) # add component to pipeline
nlp.from_disk(model_data_path) # load in model data
+infobox("Deprecation note", "⚠️")
.o-block
| As of spaCy 2.0, the #[code path] keyword argument is deprecated. spaCy
| will also raise an error if no model could be loaded and never just
| return an empty #[code Language] object. If you need a blank language,
| you can use the new function #[+api("spacy#blank") #[code spacy.blank()]]
| or import the class explicitly, e.g.
| #[code from spacy.lang.en import English].
+code-new nlp = spacy.load('/model')
+code-old nlp = spacy.load('en', path='/model')
+h(3, "spacy.blank") spacy.blank
+tag function
+tag-new(2)
p
| Create a blank model of a given language class. This function is the
| twin of #[code spacy.load()].
+aside-code("Example").
nlp_en = spacy.blank('en')
nlp_de = spacy.blank('de')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell ISO code of the language class to load.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Language]
+cell An empty #[code Language] object of the appropriate subclass.
+h(4, "spacy.info") spacy.info
+tag function
p
| The same as the #[+api("cli#info") #[code info] command]. Pretty-print
| information about your installation, models and local setup from within
| spaCy. To get the model meta data as a dictionary instead, you can
| use the #[code meta] attribute on your #[code nlp] object with a
| loaded model, e.g. #[code nlp['meta']].
+aside-code("Example").
spacy.info()
spacy.info('en')
spacy.info('de', markdown=True)
+table(["Name", "Type", "Description"])
+row
+cell #[code model]
+cell unicode
+cell A model, i.e. shortcut link, package name or path (optional).
+row
+cell #[code markdown]
+cell bool
+cell Print information as Markdown.
+h(3, "spacy.explain") spacy.explain
+tag function
p
| Get a description for a given POS tag, dependency label or entity type.
| For a list of available terms, see
| #[+src(gh("spacy", "spacy/glossary.py")) #[code glossary.py]].
+aside-code("Example").
spacy.explain('NORP')
# Nationalities or religious or political groups
doc = nlp(u'Hello world')
for word in doc:
print(word.text, word.tag_, spacy.explain(word.tag_))
# Hello UH interjection
# world NN noun, singular or mass
+table(["Name", "Type", "Description"])
+row
+cell #[code term]
+cell unicode
+cell Term to explain.
+row("foot")
+cell returns
+cell unicode
+cell The explanation, or #[code None] if not found in the glossary.
+h(3, "spacy.set_factory") spacy.set_factory
+tag function
+tag-new(2)
p
| Set a factory that returns a custom
| #[+a("/usage/processing-pipelines") processing pipeline]
| component. Factories are useful for creating stateful components, especially ones which depend on shared data.
+aside-code("Example").
def my_factory(vocab):
def my_component(doc):
return doc
return my_component
spacy.set_factory('my_factory', my_factory)
nlp = Language(pipeline=['my_factory'])
+table(["Name", "Type", "Description"])
+row
+cell #[code factory_id]
+cell unicode
+cell
| Unique name of factory. If added to a new pipeline, spaCy will
| look up the factory for this ID and use it to create the
| component.
+row
+cell #[code factory]
+cell callable
+cell
| Callable that takes a #[code Vocab] object and returns a pipeline
| component.