spaCy/docs/source/reference/loading.rst

63 lines
1.8 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

=================
Loading Resources
=================
99\% of the time, you will load spaCy's resources using a language pipeline class,
e.g. `spacy.en.English`. The pipeline class reads the data from disk, from a
specified directory. By default, spaCy installs data into each language's
package directory, and loads it from there.
Usually, this is all you will need:
>>> from spacy.en import English
>>> nlp = English()
If you need to replace some of the components, you may want to just make your
own pipeline class --- the English class itself does almost no work; it just
applies the modules in order. You can also provide a function or class that
produces a tokenizer, tagger, parser or entity recognizer to :code:`English.__init__`,
to customize the pipeline:
>>> from spacy.en import English
>>> from my_module import MyTagger
>>> nlp = English(Tagger=MyTagger)
In more detail:
.. code::
class English(object):
def __init__(self,
data_dir=path.join(path.dirname(__file__), 'data'),
Tokenizer=Tokenizer.from_dir,
Tagger=EnPosTagger,
Parser=Createarser(ArcEager),
Entity=CreateParser(BiluoNER),
load_vectors=True
):
:code:`data_dir`
:code:`unicode path`
The data directory. May be None, to disable any data loading (including
the vocabulary).
:code:`Tokenizer`
:code:`(Vocab vocab, unicode data_dir)(unicode) --> Doc`
A class/function that creates the tokenizer.
:code:`Tagger` / :code:`Parser` / :code:`Entity`
:code:`(Vocab vocab, unicode data_dir)(Doc) --> None`
A class/function that creates the part-of-speech tagger /
syntactic dependency parser / named entity recogniser.
May be None or False, to disable tagging.
:code:`load_vectors`
:code:`bool`
A boolean value to control whether the word vectors are loaded.