spaCy/extra/example_data/textcat_example_data
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
..
CC0.txt
CC_BY-SA-3.0.txt
CC_BY-SA-4.0.txt
README.md
cooking.json
cooking.jsonl
jigsaw-toxic-comment.json
jigsaw-toxic-comment.jsonl
textcatjsonl_to_trainjson.py

README.md

Examples of textcat training data

spacy JSON training files were generated from JSONL with:

python textcatjsonl_to_trainjson.py -m en file.jsonl .

cooking.json is an example with mutually-exclusive classes with two labels:

  • baking
  • not_baking

jigsaw-toxic-comment.json is an example with multiple labels per instance:

  • insult
  • obscene
  • severe_toxic
  • toxic

Data Sources

Data Licenses