spaCy/zh at 8c1d0d628fb196abd33859b18a597eb0414e6c55 - spaCy - Hypermine gitea

History

adrianeboyd 0b9a5f4074 Rework Chinese language initialization and tokenization (#4619 ) * Rework Chinese language initialization * Create a `ChineseTokenizer` class * Modify jieba post-processing to handle whitespace correctly * Modify non-jieba character tokenization to handle whitespace correctly * Add a `create_tokenizer()` method to `ChineseDefaults` * Load lexical attributes * Update Chinese tag_map for UD v2 * Add very basic Chinese tests * Test tokenization with and without jieba * Test `like_num` attribute * Fix try_jieba_import() * Fix zh code formatting		2019-11-11 14:23:21 +01:00
..
__init__.py	…
test_text.py	…
test_tokenizer.py	…