Commit Graph

11 Commits

Author SHA1 Message Date
Ben Eyal d8098a8be2 Use `regex` instead of `re` 2017-04-20 02:22:52 +03:00
ines 66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Gyorgy Orosz b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz a45f22913f Added further abbreviations present in the Szeged corpus 2017-01-14 22:08:55 +01:00
Gyorgy Orosz 63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz be7a7aeb1a Reversed accidental changes. 2017-01-14 15:59:36 +01:00
Gyorgy Orosz 1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani 0dec90e9f7 Use global abbreviation data languages and remove duplicates 2017-01-08 20:36:00 +01:00
Gyorgy Orosz 45e045a87b Unicode/UTF8 compatibility for Python2 2016-12-24 00:21:00 +01:00
Gyorgy Orosz 6add156075 Refactored language data structure 2016-12-20 22:28:20 +01:00