Commit Graph

68 Commits

Author SHA1 Message Date
Matthew Honnibal 0f9b8a00a5 Unbreak data download 2017-01-09 23:40:26 +01:00
Matthew Honnibal d9a77ddf14 Return None for data path if it doesn't exist 2017-01-09 14:10:05 +01:00
Ines Montani de5aa92bc2 Handle deprecated tokenizer prefix data 2017-01-08 20:33:28 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani 66c7348cda Add update_exc util function 2016-12-08 13:58:12 +01:00
Ines Montani 8e977cc71c Fix formatting 2016-12-08 13:56:17 +01:00
Matthew Honnibal 6b8b05ef83 Specify that spacy.util is encoded in utf8 2016-11-02 19:58:00 +01:00
Matthew Honnibal 9efe568177 Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596 2016-11-02 12:31:34 +01:00
Matthew Honnibal 5e923b9bfa Return None in match_best_version if not path exists. 2016-10-15 14:47:29 +02:00
Matthew Honnibal ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal 95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal 82b8cc5efb Whitespace 2016-09-24 22:17:01 +02:00
Matthew Honnibal f19af6cb2c Python 3 compatible basestring 2016-09-24 22:08:43 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal 83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Daylen Yang 5405e7dd73 Fix get_lang_class parsing (take 2) 2016-05-16 16:40:31 -07:00
Matthew Honnibal b240104f40 Revert "Fix get_lang_class parsing" 2016-05-17 08:04:26 +10:00
Daylen Yang 1692c2df3c Fix get_lang_class parsing
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
2016-05-16 14:38:20 -07:00
Henning Peters ff690f76ba fix loading non-german models 2016-04-12 16:00:56 +02:00
Henning Peters c90d4a6f17 relative imports in __init__.py 2016-03-26 11:44:53 +01:00
Henning Peters b8f63071eb add lang registration facility 2016-03-25 18:54:45 +01:00
Henning Peters a7d7ea3afa first idea for supporting multiple langs in download script 2016-03-24 11:19:43 +01:00
Henning Peters eb7ae61b1c cleanup api 2016-03-08 12:59:18 +01:00
Henning Peters 9cc4f8d5b3 avoid shadowing __name__ 2016-02-15 01:33:39 +01:00
Henning Peters 235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Henning Peters 6d1a3af343 cleanup unused 2016-01-16 10:05:04 +01:00
Henning Peters 846fa49b2a distinct load() and from_package() methods 2016-01-16 10:00:57 +01:00
Henning Peters 211913d689 add about.py, adapt setup.py 2016-01-15 18:57:01 +01:00
Henning Peters 788f734513 refactored data_dir->via, add zip_safe, add spacy.load() 2016-01-15 18:01:02 +01:00
Henning Peters d9471f684f fix typo 2016-01-14 12:14:12 +01:00
Henning Peters 9b75d872b0 fix model download 2016-01-14 12:02:56 +01:00
Henning Peters bc229790ac integrate with sputnik 2016-01-13 19:46:17 +01:00
Matthew Honnibal eaf2ad59f1 * Fix use of mock Package object 2015-12-31 04:13:15 +01:00
Matthew Honnibal a2dfdec85d * Clean up spacy.util 2015-12-29 18:06:09 +01:00
Matthew Honnibal aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal 4131e45543 * Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way 2015-12-29 16:55:03 +01:00
Henning Peters d8d348bb55 allow to specify version constraint within model name 2015-12-18 19:12:08 +01:00
Henning Peters cfa187aaf0 fix tests 2015-12-18 10:58:02 +01:00
Henning Peters 8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00
Henning Peters 9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Matthew Honnibal dc393a5f1d Merge pull request #126 from tomtung/master
Improve slicing support for both Doc and Span
2015-10-10 14:14:57 +11:00
Matthew Honnibal 83dccf0fd7 * Use io module insteads of deprecated codecs module 2015-10-10 14:13:01 +11:00
Yubing (Tom) Dong 3fd3bc79aa Refactor to remove duplicate slicing logic 2015-10-07 01:25:35 -07:00
alvations 8199012d26 changing deprecated codecs.open to io.open =) 2015-09-30 20:10:15 +02:00
Matthew Honnibal 6ab1696b15 * Remove read_encoding_freqs from util.py 2015-07-23 01:17:32 +02:00
Matthew Honnibal 317cbbc015 * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. 2015-07-19 15:18:17 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Jordan Suchow 5f0f940a1f Remove unused imports 2015-04-19 01:05:22 -07:00
Matthew Honnibal 3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00
Matthew Honnibal f5d41028b5 * Move around data files for test release 2015-01-03 01:59:22 +11:00