Reorganise English tokenizer exceptions (as discussed in #718)

Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
This commit is contained in:
Ines Montani 2017-01-03 18:26:09 +01:00
parent fb9d3bb022
commit 35b39f53c3
1 changed files with 416 additions and 1781 deletions

File diff suppressed because it is too large Load Diff