diff --git a/docs/source/reference/loading.rst b/docs/source/reference/loading.rst new file mode 100644 index 000000000..6d49d4a57 --- /dev/null +++ b/docs/source/reference/loading.rst @@ -0,0 +1,62 @@ +================= +Loading Resources +================= + +99\% of the time, you will load spaCy's resources using a language pipeline class, +e.g. `spacy.en.English`. The pipeline class reads the data from disk, from a +specified directory. By default, spaCy installs data into each language's +package directory, and loads it from there. + +Usually, this is all you will need: + + >>> from spacy.en import English + >>> nlp = English() + +If you need to replace some of the components, you may want to just make your +own pipeline class --- the English class itself does almost no work; it just +applies the modules in order. You can also provide a function or class that +produces a tokenizer, tagger, parser or entity recognizer to :code:`English.__init__`, +to customize the pipeline: + + >>> from spacy.en import English + >>> from my_module import MyTagger + >>> nlp = English(Tagger=MyTagger) + +In more detail: + +.. code:: + + class English(object): + def __init__(self, + data_dir=path.join(path.dirname(__file__), 'data'), + Tokenizer=Tokenizer.from_dir, + Tagger=EnPosTagger, + Parser=Createarser(ArcEager), + Entity=CreateParser(BiluoNER), + load_vectors=True + ): + +:code:`data_dir` + :code:`unicode path` + + The data directory. May be None, to disable any data loading (including + the vocabulary). + +:code:`Tokenizer` + :code:`(Vocab vocab, unicode data_dir)(unicode) --> Tokens` + + A class/function that creates the tokenizer. + +:code:`Tagger` / :code:`Parser` / :code:`Entity` + :code:`(Vocab vocab, unicode data_dir)(Tokens) --> None` + + A class/function that creates the part-of-speech tagger / + syntactic dependency parser / named entity recogniser. + May be None or False, to disable tagging. + +:code:`load_vectors` + :code:`bool` + A boolean value to control whether the word vectors are loaded. + + +