spaCy/lang_data/de
Wolfgang Seeker eae35e9b27 add tokenizer files for German, add/change code to train German pos tagger
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/

- init_model.py
	- change doc freq threshold to 0
- add train_german_tagger.py
	- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
..
abbrev.de.tab add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
gazetteer.json add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
generate_specials.py add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
infix.txt add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
lemma_rules.json * Set the German lemma rules to be an empty JSON object 2016-02-02 22:30:51 +01:00
morphs.json
prefix.txt add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
sample.txt
specials.json add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
suffix.txt add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
tag_map.json * Add missing tags to the German tag map 2016-02-02 22:30:22 +01:00