Commit Graph

68 Commits

Author SHA1 Message Date
Matthew Honnibal c5902f2b4b * Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod 2015-12-29 16:56:02 +01:00
Henning Peters 8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00
Henning Peters 9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
maxirmx f07e4accd7 Fixing encoding issue #4 2015-10-21 20:45:56 +03:00
maxirmx fcbfff043f Fixing encoding issue #3 2015-10-21 15:52:34 +03:00
maxirmx fe9d2e2c4e Fixing encode issue #2 2015-10-21 15:36:21 +03:00
maxirmx e4a1726f77 Fixing encoding issue
UTF-8
2015-10-21 14:16:37 +03:00
Matthew Honnibal 5332c0b697 * Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130 2015-10-09 18:54:40 +11:00
Matthew Honnibal 24ed3fc25c * Check file existance before opening in lemmatizer 2015-09-13 10:45:21 +10:00
Matthew Honnibal 631c843ed1 * Don't look for index.adv in le,matizer 2015-09-12 06:03:44 +02:00
Matthew Honnibal 7c660c5efc * Use dict.get in lemmatizer 2015-09-10 14:51:39 +02:00
Matthew Honnibal 64d71f8893 * Fix lemmatizer 2015-09-08 15:38:03 +02:00
Matthew Honnibal f0a7c99554 * Relax rule-requirement in lemmatizer 2015-08-27 10:26:19 +02:00
Matthew Honnibal 0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal c5a27d1821 * Move lemmatizer to spacy 2015-08-25 15:47:08 +02:00
Matthew Honnibal e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal 99bbbb6feb * Work on morphological processing 2014-12-08 21:12:15 +11:00
Matthew Honnibal 7b68f911cf * Add WordNet lemmatizer 2014-12-08 01:39:13 +11:00