Commit Graph

25 Commits

Author SHA1 Message Date
ines 66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Matthew Honnibal bd4375a2e6 Remove comment 2017-02-27 11:44:26 +01:00
Matthew Honnibal e7e22d8be6 Move import within get_exceptions() function, to speed import 2017-02-27 11:34:48 +01:00
Matthew Honnibal 26446aa728 Avoid loading all French exceptions on import
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00
ines 0e2e331b58 Convert exceptions to Python list 2017-02-24 18:22:40 +01:00
ines f08e180a47 Make groups non-capturing
Prevents hitting the 100 named groups limit in Python
2017-02-10 13:35:02 +01:00
ines fa3b8512da Use consistent imports and exports
Bundle everything in language_data to keep it consistent with other
languages and make TOKENIZER_EXCEPTIONS importable from there.
2017-02-10 13:34:09 +01:00
ines 21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque 5d706ab95d Merge tokenizer exceptions from PR #802 2017-02-09 16:30:28 +01:00
Raphaël Bournhonesque 85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Raphaël Bournhonesque 1faaf698ca Add infixes and abbreviation exceptions (fr) 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque cf8474401b Remove unused import statement 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque 902f136f18 Add support for elision in French 2017-01-24 10:57:37 +01:00
Ines Montani 0dec90e9f7 Use global abbreviation data languages and remove duplicates 2017-01-08 20:36:00 +01:00
Ines Montani 2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00
Ines Montani e0a7b5c612 Fix formatting 2016-12-17 12:33:09 +01:00
Ines Montani 08162dce67 Move shared functions and constants to global language data 2016-12-17 12:32:48 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani 487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani 1b3b043660 Add French stopwords 2016-12-08 20:12:43 +01:00
Ines Montani 8863e504eb Update French language data 2016-12-08 20:07:14 +01:00
Matthew Honnibal 3d4bd96e8a Fix infixes in french 2016-11-02 20:41:43 +01:00
Matthew Honnibal ad1c747c6b Fix stray POS in language stubs 2016-11-02 20:37:55 +01:00
Matthew Honnibal 6dbf4f7ad7 Stub out support for French, Spanish, Italian and Portuguese 2016-11-02 20:02:41 +01:00