Commit Graph

220 Commits

Author SHA1 Message Date
Ines Montani a22322187f Add missing lemmas to tokenizer exceptions (fixes #674) 2016-12-17 12:42:41 +01:00
Ines Montani 5445074cbd Expand tokenizer exceptions with unicode apostrophe (fixes #685) 2016-12-17 12:34:08 +01:00
Ines Montani e0a7b5c612 Fix formatting 2016-12-17 12:33:09 +01:00
Ines Montani 08162dce67 Move shared functions and constants to global language data 2016-12-17 12:32:48 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani 487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani d8d50a0334 Add tokenizer exception for "gonna" (fixes #691) 2016-12-17 11:59:28 +01:00
Ines Montani c69b77d8aa Revert "Add exception for "gonna""
This reverts commit 280c03f67b.
2016-12-17 11:56:44 +01:00
Ines Montani 280c03f67b Add exception for "gonna" 2016-12-17 11:54:59 +01:00
Ines Montani c0c5f31950 Remove unused data and download script 2016-12-08 20:39:49 +01:00
Ines Montani 0c39654786 Remove unused import 2016-12-08 19:46:53 +01:00
Ines Montani e47ee94761 Split punctuation into its own file 2016-12-08 19:46:43 +01:00
Ines Montani 311b30ab35 Reorganize exceptions for English and German 2016-12-08 13:58:32 +01:00
Ines Montani 877f09218b Add more custom rules for abbreviations 2016-12-08 12:47:01 +01:00
Ines Montani ec44bee321 Fix capitalization on morphological features 2016-12-08 12:00:54 +01:00
Ines Montani ce979553df Resolve conflict 2016-12-07 21:16:52 +01:00
Ines Montani 0d07d7fc80 Apply emoticon exceptions to tokenizer 2016-12-07 21:11:59 +01:00
Ines Montani 71f0f34cb3 Fix formatting 2016-12-07 21:11:29 +01:00
Ines Montani 1285c4ba93 Update English language data 2016-12-07 20:33:28 +01:00
Ines Montani a662a95294 Add line breaks 2016-12-07 20:33:28 +01:00
Ines Montani e0712d1b32 Reformat language data 2016-12-07 20:33:28 +01:00
Ines Montani 4dcfafde02 Add line breaks 2016-11-24 14:57:37 +01:00
Ines Montani de747e39e7 Reformat language data 2016-11-24 13:51:32 +01:00
Mark Amery 1988fce389 Merge remote-tracking branch 'origin/master' into specify-data-path 2016-11-20 16:07:14 +00:00
Mark Amery 3871007c72 Let --data-path be specified when running download.py scripts
Resolves https://github.com/explosion/spaCy/issues/637
2016-11-20 15:48:04 +00:00
Ines Montani dad2c6cae9 Strip trailing whitespace 2016-11-20 16:45:51 +01:00
Matthew Honnibal f0917b6808 Fix Issue #376: and/or was tagged as a noun. 2016-11-04 15:21:28 +01:00
Matthew Honnibal 737816e86e Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly. 2016-11-04 15:16:20 +01:00
Matthew Honnibal 41a90a7fbb Add tokenizer exception for 'Ph.D.', to fix 592. 2016-11-03 00:03:34 +01:00
Matthew Honnibal e7414cd064 Try to fix weird install glitch. 2016-10-23 19:46:28 +02:00
Matthew Honnibal 622b0a9674 Tweak download script 2016-10-19 00:52:16 +02:00
Matthew Honnibal edc45c19d6 Update download script 2016-10-19 00:41:14 +02:00
Matthew Honnibal 8c8f5c62c6 Add LANG attribute to English and German 2016-10-18 18:52:48 +02:00
Matthew Honnibal ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal 7db956133e Move tokenizer data for German into spacy.de.language_data 2016-09-25 15:37:33 +02:00
Matthew Honnibal 95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal d7e9acdcdf Add English language data, so that the tokenizer doesn't require the data download 2016-09-25 14:49:00 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Henning Peters 470cdf5bf9 remove deprecated LOCAL_DATA_DIR 2016-04-05 11:25:54 +02:00
Henning Peters a7d7ea3afa first idea for supporting multiple langs in download script 2016-03-24 11:19:43 +01:00
Henning Peters 9cc4f8d5b3 avoid shadowing __name__ 2016-02-15 01:33:39 +01:00
Matthew Honnibal 445164d5b4 * Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated 2016-01-19 02:54:56 +01:00
Henning Peters 5551052840 fix py2/3 issue 2016-01-16 12:44:53 +01:00
Henning Peters 235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Henning Peters 211913d689 add about.py, adapt setup.py 2016-01-15 18:57:01 +01:00
Henning Peters 780cb847c9 add default_model to about 2016-01-15 18:07:15 +01:00
Henning Peters 788f734513 refactored data_dir->via, add zip_safe, add spacy.load() 2016-01-15 18:01:02 +01:00
Henning Peters 9b75d872b0 fix model download 2016-01-14 12:02:56 +01:00
Matthew Honnibal 187960606f * Fix pickle problems 2015-12-28 16:54:03 +01:00
Henning Peters 32d655b6e1 bump version 2015-12-28 09:34:39 +01:00