Adriane Boyd
|
30030176ee
|
Update Korean defaults for Tokenizer (#10322)
Update Korean defaults for `Tokenizer` for tokenization following UD
Korean Kaist.
|
2022-02-21 10:26:19 +01:00 |
Ines Montani
|
db55577c45
|
Drop Python 2.7 and 3.5 (#4828)
* Remove unicode declarations
* Remove Python 3.5 and 2.7 from CI
* Don't require pathlib
* Replace compat helpers
* Remove OrderedDict
* Use f-strings
* Set Cython compiler language level
* Fix typo
* Re-add OrderedDict for Table
* Update setup.cfg
* Revert CONTRIBUTING.md
* Revert lookups.md
* Revert top-level.md
* Small adjustments and docs [ci skip]
|
2019-12-22 01:53:56 +01:00 |
Ines Montani
|
3d8fd4b461
|
Revert #4334
|
2019-09-29 17:32:12 +02:00 |
Ines Montani
|
c9cd516d96
|
Move tests out of package (#4334)
* Move tests out of package
* Fix typo
|
2019-09-28 18:05:00 +02:00 |
Bae Yong-Ju
|
a55f5a744f
|
Fix ValueError exception on empty Korean text. (#4245)
|
2019-09-06 10:29:40 +02:00 |
Bae Yong-Ju
|
05fbf5d976
|
Fix error when Korean text contains regexp special characters. (#4022)
|
2019-07-25 17:53:33 +02:00 |
Ines Montani
|
0b8406a05c
|
Tidy up and auto-format
|
2019-07-11 12:02:25 +02:00 |
cedar101
|
58f06e6180
|
Korean support (#3901)
* start lang/ko
* add test codes
* using natto-py
* add test_ko_tokenizer_full_tags()
* spaCy contributor agreement
* external dependency for ko
* collections.namedtuple for python version < 3.5
* case fix
* tuple unpacking
* add jongseong(final consonant)
* apply mecab option
* Remove Pipfile for now
Co-authored-by: Ines Montani <ines@ines.io>
|
2019-07-09 22:23:16 +02:00 |