Commit Graph

3754 Commits

Author SHA1 Message Date
Ines Montani f324311249 Add global language data utils 2016-12-17 12:27:41 +01:00
Ines Montani 487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani d8d50a0334 Add tokenizer exception for "gonna" (fixes #691) 2016-12-17 11:59:28 +01:00
Ines Montani c69b77d8aa Revert "Add exception for "gonna""
This reverts commit 280c03f67b.
2016-12-17 11:56:44 +01:00
Ines Montani 280c03f67b Add exception for "gonna" 2016-12-17 11:54:59 +01:00
Ines Montani 63024466a9 Add Portuguese stopwords 2016-12-08 20:45:07 +01:00
Ines Montani 7bfe2d4abc Update Portuguese language data 2016-12-08 20:41:41 +01:00
Ines Montani c0c5f31950 Remove unused data and download script 2016-12-08 20:39:49 +01:00
Ines Montani 0a6d529104 Remove unused data 2016-12-08 20:36:56 +01:00
Ines Montani 1b3b043660 Add French stopwords 2016-12-08 20:12:43 +01:00
Ines Montani 8863e504eb Update French language data 2016-12-08 20:07:14 +01:00
Ines Montani 7cb9f51be6 Add Italian stopwords 2016-12-08 20:05:25 +01:00
Ines Montani 470a0e0bea Update Italian language data 2016-12-08 19:52:18 +01:00
Ines Montani 1a284d342e Add Spanish language data 2016-12-08 19:47:03 +01:00
Ines Montani 0c39654786 Remove unused import 2016-12-08 19:46:53 +01:00
Ines Montani e47ee94761 Split punctuation into its own file 2016-12-08 19:46:43 +01:00
Ines Montani 70b51ed7c8 Remove time from German language data 2016-12-08 19:45:50 +01:00
Ines Montani e8ae588be9 Add emoticons 2016-12-08 19:45:18 +01:00
Ines Montani 5908c0ed9f Fix formatting 2016-12-08 19:45:11 +01:00
Ines Montani 311b30ab35 Reorganize exceptions for English and German 2016-12-08 13:58:32 +01:00
Ines Montani 66c7348cda Add update_exc util function 2016-12-08 13:58:12 +01:00
Ines Montani 1256232fad Fix formatting 2016-12-08 13:56:40 +01:00
Ines Montani 8e977cc71c Fix formatting 2016-12-08 13:56:17 +01:00
Ines Montani 0176b99004 Fix formatting 2016-12-08 12:48:02 +01:00
Ines Montani 877f09218b Add more custom rules for abbreviations 2016-12-08 12:47:01 +01:00
Ines Montani bfaa42636c Update language data for German 2016-12-08 12:01:09 +01:00
Ines Montani ec44bee321 Fix capitalization on morphological features 2016-12-08 12:00:54 +01:00
Ines Montani ce979553df Resolve conflict 2016-12-07 21:16:52 +01:00
Ines Montani 8350d65695 Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Ines Montani 52e7d634df Remove trailing whitespace 2016-12-07 21:12:19 +01:00
Ines Montani 0d07d7fc80 Apply emoticon exceptions to tokenizer 2016-12-07 21:11:59 +01:00
Ines Montani 71f0f34cb3 Fix formatting 2016-12-07 21:11:29 +01:00
Ines Montani 9413bcd9ee Declare encoding and unicode literals 2016-12-07 21:10:34 +01:00
Ines Montani a280ff2657 Fix __all__ 2016-12-07 21:10:12 +01:00
Ines Montani ba8721953c Add missing emoticons 2016-12-07 21:09:44 +01:00
Ines Montani 1285c4ba93 Update English language data 2016-12-07 20:33:28 +01:00
Ines Montani 4a1e206064 Remove old lang_data directory 2016-12-07 20:33:28 +01:00
Ines Montani 79dce0aabe Add emoticons 2016-12-07 20:33:28 +01:00
Ines Montani a662a95294 Add line breaks 2016-12-07 20:33:28 +01:00
Ines Montani 07f0efb102 Add test for tokenizer regular expressions 2016-12-07 20:33:28 +01:00
Ines Montani e0712d1b32 Reformat language data 2016-12-07 20:33:28 +01:00
Ines Montani 5ad5408242 Update README.rst 2016-12-03 11:55:22 +01:00
Ines Montani a5707f4d05 Update README.rst 2016-12-03 11:53:38 +01:00
Matthew Honnibal 0c0f4c965d Increment version 2016-12-03 11:16:52 +01:00
Matthew Honnibal 73288497d5 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-02 11:06:06 +01:00
Matthew Honnibal f6e356aada Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667 2016-12-02 11:05:50 +01:00
Ines Montani 4b889855cd Merge pull request #666 from blarghmatey/patch-1
Fixed minor typo
2016-12-01 12:10:30 +01:00
Tobias Macey 1d768d6510 Fixed minor typo
The word `motto` was missing the second `t`.
2016-12-01 06:08:33 -05:00
Matthew Honnibal 296d33a4fc Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-26 12:36:18 +01:00
Matthew Honnibal 1f6c37c6f5 Fix create_tokenizer when nlp is None 2016-11-26 12:36:04 +01:00