Commit Graph

477 Commits

Author SHA1 Message Date
Matthew Honnibal eaf2ad59f1 * Fix use of mock Package object 2015-12-31 04:13:15 +01:00
Matthew Honnibal a6ba43ecaf * Fix errors in packaging revision 2015-12-29 18:37:26 +01:00
Matthew Honnibal aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal f5dea1406d * Fix silly mistake in Language.__init__ 2015-12-28 18:48:57 +01:00
Matthew Honnibal 187960606f * Fix pickle problems 2015-12-28 16:54:03 +01:00
Matthew Honnibal 8c7e149ec9 * Replace kwargs argument of Language.__init__ with explicit arguments, to fix pickle bug 2015-12-28 15:56:27 +01:00
Henning Peters d8d348bb55 allow to specify version constraint within model name 2015-12-18 19:12:08 +01:00
Henning Peters cfa187aaf0 fix tests 2015-12-18 10:58:02 +01:00
Henning Peters 8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00
Henning Peters 345dda6f53 small fixes, add package build step 2015-12-07 06:50:26 +01:00
Henning Peters 9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Matthew Honnibal 3c162dcac3 * Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc. 2015-11-07 03:24:30 +11:00
Matthew Honnibal adc7bbd6cf * Fix name of like_num in default_lex_attrs 2015-11-04 22:02:47 +11:00
Matthew Honnibal e96faf29e7 * Rename like_number to like_num, to fix inconsistency re Issue #166 2015-11-04 22:01:44 +11:00
Matthew Honnibal f18fd8c659 * Fix language.py for change in StringStore load API 2015-10-23 03:48:12 +11:00
Matthew Honnibal 2348a08481 * Load/dump strings with a json file, instead of the hacky strings file we were using. 2015-10-22 21:13:03 +11:00
Matthew Honnibal 9baf0abd59 * Save vocab after training. 2015-10-22 21:09:14 +11:00
Matthew Honnibal 20fd36a0f7 * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. 2015-10-13 13:44:41 +11:00
Matthew Honnibal a6ced80c0c * Fix Issue #116: Misleading handling of True value in Language.__init__. 2015-09-29 20:54:12 +10:00
Matthew Honnibal 27f988b167 * Remove the vectors option to Vocab, preferring to either load vectors from disk, or set them on the Lexeme objects. 2015-09-15 14:41:48 +10:00
Matthew Honnibal e13e47e9e5 * Add English stop words 2015-09-14 17:48:51 +10:00
Matthew Honnibal d9f1fc2112 * Add deprecation warning for unused load_vectors argument. 2015-09-09 14:31:09 +02:00
Matthew Honnibal 534e3dda3c * More work on language independent parsing 2015-08-28 03:44:54 +02:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal 76996f4145 * Hack on generic Language class. Still needs work for morphology, defaults, etc 2015-08-26 19:16:09 +02:00
Matthew Honnibal f2f699ac18 * Add language base class 2015-08-25 15:37:17 +02:00