Commit Graph

239 Commits

Author SHA1 Message Date
JM 70ff0639b5 Fixed missing vec_path declaration that was failing if 'add_vectors' was set
Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides.
2016-12-20 18:21:05 +01:00
Matthew Honnibal 13a0b31279 Another tweak to GloVe path hackery. 2016-12-18 23:12:49 +01:00
Matthew Honnibal 2c6228565e Fix vector loading re glove hack 2016-12-18 23:06:44 +01:00
Matthew Honnibal 618b50a064 Fix issue #684: GloVe vectors not loaded in spacy.en.English. 2016-12-18 22:46:31 +01:00
Matthew Honnibal 2ef9d53117 Untested fix for issue #684: GloVe vectors hack should be inserted in English, not in spacy.load. 2016-12-18 22:29:31 +01:00
Matthew Honnibal 7a98ee5e5a Merge language data change 2016-12-18 17:03:52 +01:00
Ines Montani b99d683a93 Fix formatting 2016-12-18 16:58:28 +01:00
Ines Montani b11d8cd3db Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data 2016-12-18 16:57:12 +01:00
Ines Montani 2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00
Matthew Honnibal 44f4f008bd Wire up lemmatizer rules for English 2016-12-18 15:50:09 +01:00
Ines Montani 1bff59a8db Update English language data 2016-12-18 15:36:53 +01:00
Ines Montani 2eb163c5dd Add lemma rules 2016-12-18 15:36:53 +01:00
Ines Montani 29ad8143d8 Add morph rules 2016-12-18 15:36:53 +01:00
Ines Montani 704c7442e0 Break language data components into their own files 2016-12-18 15:36:53 +01:00
Ines Montani 28326649f3 Fix typo 2016-12-18 13:30:03 +01:00
Matthew Honnibal 28d63ec58e Restore missing '' character in tokenizer exceptions. 2016-12-18 05:34:51 +01:00
Ines Montani a9421652c9 Remove duplicates in tag map 2016-12-17 22:44:31 +01:00
Ines Montani 577adad945 Fix formatting 2016-12-17 14:00:52 +01:00
Ines Montani bb94e784dc Fix typo 2016-12-17 13:59:30 +01:00
Ines Montani a22322187f Add missing lemmas to tokenizer exceptions (fixes #674) 2016-12-17 12:42:41 +01:00
Ines Montani 5445074cbd Expand tokenizer exceptions with unicode apostrophe (fixes #685) 2016-12-17 12:34:08 +01:00
Ines Montani e0a7b5c612 Fix formatting 2016-12-17 12:33:09 +01:00
Ines Montani 08162dce67 Move shared functions and constants to global language data 2016-12-17 12:32:48 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani 487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani d8d50a0334 Add tokenizer exception for "gonna" (fixes #691) 2016-12-17 11:59:28 +01:00
Ines Montani c69b77d8aa Revert "Add exception for "gonna""
This reverts commit 280c03f67b.
2016-12-17 11:56:44 +01:00
Ines Montani 280c03f67b Add exception for "gonna" 2016-12-17 11:54:59 +01:00
Ines Montani c0c5f31950 Remove unused data and download script 2016-12-08 20:39:49 +01:00
Ines Montani 0c39654786 Remove unused import 2016-12-08 19:46:53 +01:00
Ines Montani e47ee94761 Split punctuation into its own file 2016-12-08 19:46:43 +01:00
Ines Montani 311b30ab35 Reorganize exceptions for English and German 2016-12-08 13:58:32 +01:00
Ines Montani 877f09218b Add more custom rules for abbreviations 2016-12-08 12:47:01 +01:00
Ines Montani ec44bee321 Fix capitalization on morphological features 2016-12-08 12:00:54 +01:00
Ines Montani ce979553df Resolve conflict 2016-12-07 21:16:52 +01:00
Ines Montani 0d07d7fc80 Apply emoticon exceptions to tokenizer 2016-12-07 21:11:59 +01:00
Ines Montani 71f0f34cb3 Fix formatting 2016-12-07 21:11:29 +01:00
Ines Montani 1285c4ba93 Update English language data 2016-12-07 20:33:28 +01:00
Ines Montani a662a95294 Add line breaks 2016-12-07 20:33:28 +01:00
Ines Montani e0712d1b32 Reformat language data 2016-12-07 20:33:28 +01:00
Ines Montani 4dcfafde02 Add line breaks 2016-11-24 14:57:37 +01:00
Ines Montani de747e39e7 Reformat language data 2016-11-24 13:51:32 +01:00
Mark Amery 1988fce389 Merge remote-tracking branch 'origin/master' into specify-data-path 2016-11-20 16:07:14 +00:00
Mark Amery 3871007c72 Let --data-path be specified when running download.py scripts
Resolves https://github.com/explosion/spaCy/issues/637
2016-11-20 15:48:04 +00:00
Ines Montani dad2c6cae9 Strip trailing whitespace 2016-11-20 16:45:51 +01:00
Matthew Honnibal f0917b6808 Fix Issue #376: and/or was tagged as a noun. 2016-11-04 15:21:28 +01:00
Matthew Honnibal 737816e86e Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly. 2016-11-04 15:16:20 +01:00
Matthew Honnibal 41a90a7fbb Add tokenizer exception for 'Ph.D.', to fix 592. 2016-11-03 00:03:34 +01:00
Matthew Honnibal e7414cd064 Try to fix weird install glitch. 2016-10-23 19:46:28 +02:00
Matthew Honnibal 622b0a9674 Tweak download script 2016-10-19 00:52:16 +02:00