Commit Graph

42 Commits

Author SHA1 Message Date
ines 71956c94db Handle deprecated language-specific model downloading 2017-03-15 17:37:55 +01:00
ines 66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Ines Montani 0dec90e9f7 Use global abbreviation data languages and remove duplicates 2017-01-08 20:36:00 +01:00
Ines Montani 702d1eed93 Update tokenizer exceptions for German 2016-12-21 18:06:27 +01:00
Ines Montani 2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00
Ines Montani 32b36c3882 Break language data components into their own files 2016-12-18 15:40:22 +01:00
Ines Montani 0fc4e45cb3 Fix tag map for German 2016-12-18 13:30:03 +01:00
Ines Montani 69baf1c9a8 Fix tag map 2016-12-17 22:44:22 +01:00
Ines Montani fc4ad17136 Fix typo 2016-12-17 14:00:47 +01:00
Ines Montani e0a7b5c612 Fix formatting 2016-12-17 12:33:09 +01:00
Ines Montani 08162dce67 Move shared functions and constants to global language data 2016-12-17 12:32:48 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani 487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani 0a6d529104 Remove unused data 2016-12-08 20:36:56 +01:00
Ines Montani 0c39654786 Remove unused import 2016-12-08 19:46:53 +01:00
Ines Montani e47ee94761 Split punctuation into its own file 2016-12-08 19:46:43 +01:00
Ines Montani 70b51ed7c8 Remove time from German language data 2016-12-08 19:45:50 +01:00
Ines Montani 311b30ab35 Reorganize exceptions for English and German 2016-12-08 13:58:32 +01:00
Ines Montani 1256232fad Fix formatting 2016-12-08 13:56:40 +01:00
Ines Montani 0176b99004 Fix formatting 2016-12-08 12:48:02 +01:00
Ines Montani bfaa42636c Update language data for German 2016-12-08 12:01:09 +01:00
Ines Montani e0712d1b32 Reformat language data 2016-12-07 20:33:28 +01:00
Mark Amery 1988fce389 Merge remote-tracking branch 'origin/master' into specify-data-path 2016-11-20 16:07:14 +00:00
Mark Amery 3871007c72 Let --data-path be specified when running download.py scripts
Resolves https://github.com/explosion/spaCy/issues/637
2016-11-20 15:48:04 +00:00
Ines Montani 3082e49326 Update and reformat German stopwords 2016-11-20 16:45:26 +01:00
Sourav Singh 6745eac309 Update language_data.py 2016-11-20 19:52:02 +05:30
Sourav Singh 4d9aae7d6a Add German Stopwords 2016-11-19 22:47:53 +05:30
Matthew Honnibal 8c8f5c62c6 Add LANG attribute to English and German 2016-10-18 18:52:48 +02:00
Matthew Honnibal e56653f848 Add language data for German 2016-09-25 15:44:45 +02:00
Matthew Honnibal 7db956133e Move tokenizer data for German into spacy.de.language_data 2016-09-25 15:37:33 +02:00
Matthew Honnibal 95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Wolfgang Seeker 92bfbebeec remove unnecessary imports 2016-05-02 17:33:22 +02:00
Wolfgang Seeker 857454ffa0 fix indentation -.- 2016-05-02 17:10:41 +02:00
Wolfgang Seeker dae6bc05eb define German dummy lemmatizer until morphology is done 2016-05-02 16:04:53 +02:00
Henning Peters a7d7ea3afa first idea for supporting multiple langs in download script 2016-03-24 11:19:43 +01:00
Wolfgang Seeker 690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Henning Peters 9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Matthew Honnibal 528e26a506 * Add rule to ensure ordinals are preserved as single tokens 2015-09-22 12:26:05 +10:00
Matthew Honnibal dbb48ce49e * Delete extra wordnets 2015-09-13 10:31:37 +10:00
Matthew Honnibal 2154a54f6b * Add spacy.de 2015-09-06 21:56:47 +02:00