spaCy/website/docs/api/corpus.md

1.4 KiB

title teaser tag source new
Corpus An annotated corpus class spacy/gold/corpus.py 3

This class manages annotated corpora and can read training and development datasets in the DocBin (.spacy) format.

Corpus.__init__

Create a Corpus. The input data can be a file or a directory of files.

Name Type Description
train str / Path Training data (.spacy file or directory of .spacy files).
dev str / Path Development data (.spacy file or directory of .spacy files).
limit int Maximum number of examples returned.
RETURNS Corpus The newly constructed object.

Corpus.walk_corpus

Corpus.make_examples

Corpus.make_examples_gold_preproc

Corpus.read_docbin

Corpus.count_train

Corpus.train_dataset

Corpus.dev_dataset