Ben Eyal
|
d8098a8be2
|
Use `regex` instead of `re`
|
2017-04-20 02:22:52 +03:00 |
ines
|
66c1f194f9
|
Use consistent unicode declarations
|
2017-03-12 13:07:28 +01:00 |
Gyorgy Orosz
|
b4df202bfa
|
Better error handling
|
2017-01-14 22:24:58 +01:00 |
Gyorgy Orosz
|
a45f22913f
|
Added further abbreviations present in the Szeged corpus
|
2017-01-14 22:08:55 +01:00 |
Gyorgy Orosz
|
63037e79af
|
Fixed hyphen handling in the Hungarian tokenizer.
|
2017-01-14 16:30:11 +01:00 |
Gyorgy Orosz
|
f77c0284d6
|
Maintaining compatibility with other spacy tokenizers.
|
2017-01-14 16:19:15 +01:00 |
Gyorgy Orosz
|
be7a7aeb1a
|
Reversed accidental changes.
|
2017-01-14 15:59:36 +01:00 |
Gyorgy Orosz
|
1be5da1ac6
|
Fixed Hungarian tokenizer for numbers
|
2017-01-14 15:51:59 +01:00 |
Ines Montani
|
0dec90e9f7
|
Use global abbreviation data languages and remove duplicates
|
2017-01-08 20:36:00 +01:00 |
Gyorgy Orosz
|
45e045a87b
|
Unicode/UTF8 compatibility for Python2
|
2016-12-24 00:21:00 +01:00 |
Gyorgy Orosz
|
6add156075
|
Refactored language data structure
|
2016-12-20 22:28:20 +01:00 |