Commit Graph

2 Commits

Author SHA1 Message Date
Adriane Boyd 86d01e9229 Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
Adriane Boyd 1d59fdbd39
Update Vietnamese tokenizer (#8099)
* Adapt tokenization methods from `pyvi` to preserve text encoding and
whitespace
* Add serialization support similar to Chinese and Japanese

Note: as for Chinese and Japanese, some settings are duplicated in
`config.cfg` and `tokenizer/cfg`.
2021-05-17 18:16:20 +10:00