Commit Graph

10 Commits

Author SHA1 Message Date
ines 8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines 417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines 0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
ines ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
ines fa7e576c57 Change order of exception dicts 2017-06-03 21:52:06 +02:00
ines 0d6fa8b241 Add German norm exceptions 2017-06-03 20:54:18 +02:00
ines 924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
ines 1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
ines 73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00