Commit Graph

2882 Commits

Author SHA1 Message Date
ines fc0d793360 Reorganise Bengali punctuation rules 2017-05-09 00:01:52 +02:00
ines e895d1afd7 Reorganise French punctuation rules 2017-05-09 00:00:54 +02:00
ines 014bda0ae3 Reorganise global punctuation rules 2017-05-09 00:00:46 +02:00
ines a91278cb32 Rename _URL_PATTERN to URL_PATTERN 2017-05-09 00:00:00 +02:00
ines 604f299cf6 Add char classes to global language data 2017-05-08 23:59:33 +02:00
ines f6f5d78cb9 Fix formatting 2017-05-08 23:59:17 +02:00
ines 6eb6306843 Fix language data imports 2017-05-08 23:58:31 +02:00
ines 3c0f85de8e Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
ines 86d9c29f30 Reorder util functions 2017-05-08 23:51:15 +02:00
ines 9a0d2fdef1 Add load_lang_class() util function 2017-05-08 23:50:45 +02:00
ines 614aa09582 Tidy up Bengali tokenizer exceptions 2017-05-08 22:29:49 +02:00
ines 73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines ae99990f63 Fix formatting 2017-05-08 22:23:48 +02:00
ines f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00
ines 41a322c733 Fix LEMMA in exceptions and morph rules 2017-05-08 19:57:36 +02:00
ines 2edc0aee12 Update warning message 2017-05-08 19:53:36 +02:00
ines 6025cdb992 Fix string interpolation in times 2017-05-08 16:38:16 +02:00
ines b9ba58ba5c Add function to resolve load name
Warn if old 'path' keyword argument is used.
2017-05-08 16:33:37 +02:00
ines e6f1a5d0a1 Add unicode declaration 2017-05-08 16:22:17 +02:00
ines be5541bd16 Fix import and tokenizer exceptions 2017-05-08 16:20:14 +02:00
ines 2324788970 Remove bad tests 2017-05-08 16:15:27 +02:00
ines b88c4193e7 Add missing symbol 2017-05-08 16:15:20 +02:00
ines 9a5b2bdd4c Don't set morph rules without tag map 2017-05-08 16:15:12 +02:00
ines 4930f0fa8f Explicitly import TOKEN_MATCH 2017-05-08 16:11:54 +02:00
ines 50b7ec03ca Fix typo 2017-05-08 16:11:45 +02:00
ines 3ca611fe48 Fix wildcard imports 2017-05-08 15:56:29 +02:00
ines c2469b8135 Remove __all__ export 2017-05-08 15:56:22 +02:00
ines 14a9c3ee7a Fix wildcard import 2017-05-08 15:56:13 +02:00
ines deed623864 Remove comment 2017-05-08 15:56:05 +02:00
ines e7f95c37ee Merge base tokenizer exceptions 2017-05-08 15:55:52 +02:00
ines 24606d364c Remove redundant language_data.py files in languages
Originally intended to collect all components of a language, but just
made things messy. Now each component is in charge of exporting itself
properly.
2017-05-08 15:55:29 +02:00
ines a627d3e3b0 Reorganise Chinese language data 2017-05-08 15:54:36 +02:00
ines 7b86ee093a Reorganise Swedish language data 2017-05-08 15:54:29 +02:00
ines 50510fa947 Reorganise Portuguese language data 2017-05-08 15:52:01 +02:00
ines 279895ea83 Reorganise Dutch language data 2017-05-08 15:51:39 +02:00
ines 04ef5025bd Reorganise Norwegian language data 2017-05-08 15:51:22 +02:00
ines 5edbc725d8 Reorganise Japanese language data 2017-05-08 15:50:46 +02:00
ines 51a389d3bb Reorganise Italian language data 2017-05-08 15:50:17 +02:00
ines 1bbfa14436 Reorganise Hungarian language data 2017-05-08 15:49:56 +02:00
ines a77c9fc60d Reorganise Hebrew language data 2017-05-08 15:49:28 +02:00
ines 7f05e977fa Reorganise French language data 2017-05-08 15:49:05 +02:00
ines 0207ffdd52 Reorganise Finnish language data 2017-05-08 15:48:31 +02:00
ines 8e483ec950 Reorganise Spanish language data 2017-05-08 15:48:04 +02:00
ines c7c21b980f Reorganise English language data 2017-05-08 15:47:25 +02:00
ines 1bf9d5ec8b Reorganise German language data 2017-05-08 15:44:26 +02:00
ines 7b3a983f96 Reorganise Bengali language data 2017-05-08 15:43:50 +02:00
ines 607ba458e7 Fix whitespace 2017-05-08 15:42:31 +02:00
ines 60db497525 Add update_exc and expand_exc to util
Doesn't require separate language data util anymore
2017-05-08 15:42:12 +02:00
ines 6e5bd4f228 Remove unused functions from deprecated 2017-05-08 15:40:16 +02:00
ines f68e420bc0 Add PRON_LEMMA and DET_LEMMA to deprecated
Will be replaced with proper values across the language data later.
2017-05-08 15:35:30 +02:00