Matthew Honnibal
3cb4d455d2
Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435
2016-09-27 13:52:11 +02:00
Matthew Honnibal
fd65cf6cbb
Finish refactoring data loading
2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
2016-09-24 15:42:01 +02:00
Henning Peters
846fa49b2a
distinct load() and from_package() methods
2016-01-16 10:00:57 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
eaf2ad59f1
* Fix use of mock Package object
2015-12-31 04:13:15 +01:00
Matthew Honnibal
55bcdf8bdd
* Fix errors
2015-12-29 22:32:03 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal
c5902f2b4b
* Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod
2015-12-29 16:56:02 +01:00
Henning Peters
8359bd4d93
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
2015-12-18 09:52:55 +01:00
Henning Peters
9027cef3bc
access model via sputnik
2015-12-07 06:01:28 +01:00
maxirmx
f07e4accd7
Fixing encoding issue #4
2015-10-21 20:45:56 +03:00
maxirmx
fcbfff043f
Fixing encoding issue #3
2015-10-21 15:52:34 +03:00
maxirmx
fe9d2e2c4e
Fixing encode issue #2
2015-10-21 15:36:21 +03:00
maxirmx
e4a1726f77
Fixing encoding issue
...
UTF-8
2015-10-21 14:16:37 +03:00
Matthew Honnibal
5332c0b697
* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130
2015-10-09 18:54:40 +11:00
Matthew Honnibal
24ed3fc25c
* Check file existance before opening in lemmatizer
2015-09-13 10:45:21 +10:00
Matthew Honnibal
631c843ed1
* Don't look for index.adv in le,matizer
2015-09-12 06:03:44 +02:00
Matthew Honnibal
7c660c5efc
* Use dict.get in lemmatizer
2015-09-10 14:51:39 +02:00
Matthew Honnibal
64d71f8893
* Fix lemmatizer
2015-09-08 15:38:03 +02:00
Matthew Honnibal
f0a7c99554
* Relax rule-requirement in lemmatizer
2015-08-27 10:26:19 +02:00
Matthew Honnibal
0af139e183
* Tagger training now working. Still need to test load/save of model. Morphology still broken.
2015-08-27 09:16:11 +02:00
Matthew Honnibal
c5a27d1821
* Move lemmatizer to spacy
2015-08-25 15:47:08 +02:00
Matthew Honnibal
e1c1a4b868
* Tmp
2014-12-21 05:36:29 +11:00
Matthew Honnibal
99bbbb6feb
* Work on morphological processing
2014-12-08 21:12:15 +11:00
Matthew Honnibal
7b68f911cf
* Add WordNet lemmatizer
2014-12-08 01:39:13 +11:00