Commit Graph

190 Commits

Author SHA1 Message Date
Matthew Honnibal f4aafca222 Merge changes to test_misc 2017-05-29 12:26:02 +02:00
Matthew Honnibal ff26aa6c37 Work on to/from bytes/disk serialization methods 2017-05-29 11:45:45 +02:00
ines df920ba0e7 Add tests for displaCy and util functions and fix util typo 2017-05-29 10:51:19 +02:00
Matthew Honnibal c91b121aeb Move serialization functions to util 2017-05-29 10:13:42 +02:00
Matthew Honnibal 6dad4117ad Work on serialization for models 2017-05-29 01:37:57 +02:00
ines c1983621fb Update util functions for model loading 2017-05-28 00:22:40 +02:00
ines c8543c8237 Fix formatting and docstrings and remove deprecated function 2017-05-28 00:22:40 +02:00
ines 51882c4984 Fix formatting 2017-05-26 12:37:45 +02:00
Matthew Honnibal 80cf42e33b Fix compounding and decaying utils 2017-05-25 17:15:39 -05:00
Matthew Honnibal b9cea9cd93 Add compounding and decaying functions 2017-05-25 16:16:10 -05:00
ines b5fb43fdd8 Allow sys.exit status as exits keyword arg in util.prints() 2017-05-22 12:29:15 +02:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
Matthew Honnibal 0731971bfc Add itershuffle utility function. Maybe belongs in thinc 2017-05-21 09:05:05 -05:00
ines 3871157d84 Update spacy.util documentation 2017-05-21 01:12:09 +02:00
Matthew Honnibal 238be0f16a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-18 08:32:22 -05:00
Matthew Honnibal c214c0decb Improve env_opt reporting 2017-05-18 08:32:03 -05:00
ines 489d2fb4ba Add is_in_jupyter() helper for displaCy (see #1058) 2017-05-18 14:13:14 +02:00
ines abf0188b0a Move cupy and CudaStream to compat 2017-05-18 14:12:45 +02:00
Matthew Honnibal fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal 1d7c18e58a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-15 21:53:47 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines c31792aaec Add displaCy visualisers (see #1058) 2017-05-14 17:50:23 +02:00
ines b462076d80 Merge load_lang_class and get_lang_class 2017-05-14 01:31:10 +02:00
ines 36bebe7164 Update docstrings 2017-05-14 01:30:29 +02:00
Matthew Honnibal 4b9d69f428 Merge branch 'v2' into develop
* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module

Currently the two parsers live side-by-side, until we figure out how to
organize them.
2017-05-14 01:10:23 +02:00
Matthew Honnibal f8c02b4341 Remove cupy imports from parser, so it can work on CPU 2017-05-14 00:37:53 +02:00
ines 1694c24e52 Add docstrings, error messages and fix consistency 2017-05-13 21:22:49 +02:00
ines ee7dcf65c9 Fix expand_exc to make sure it returns combined dict 2017-05-13 21:22:25 +02:00
ines 824d09bb74 Move resolve_load_name to deprecated 2017-05-13 21:21:47 +02:00
ines c4857bc7db Remove unused argument 2017-05-12 15:37:54 +02:00
ines 86d9c29f30 Reorder util functions 2017-05-08 23:51:15 +02:00
ines 9a0d2fdef1 Add load_lang_class() util function 2017-05-08 23:50:45 +02:00
ines 2edc0aee12 Update warning message 2017-05-08 19:53:36 +02:00
ines b9ba58ba5c Add function to resolve load name
Warn if old 'path' keyword argument is used.
2017-05-08 16:33:37 +02:00
ines 607ba458e7 Fix whitespace 2017-05-08 15:42:31 +02:00
ines 60db497525 Add update_exc and expand_exc to util
Doesn't require separate language data util anymore
2017-05-08 15:42:12 +02:00
ines 95edd9e896 Let parse_package_meta take full path 2017-05-08 15:30:48 +02:00
ines 326746eb15 Add util function to resolve arg to model path
1. check if in data dir or shortcut link
2. check if installed as a pip package
3. check if string is path to model
4. check if Path or Path-like object
2017-05-08 15:29:47 +02:00
ines 94697e9afc Fix typo 2017-05-08 02:00:37 +02:00
ines c4492d260a Fix kwargs 2017-05-08 01:05:24 +02:00
ines 59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
ines e34069db9f Move is_package and get_model_package_path to util 2017-05-07 23:24:51 +02:00
Ben Eyal d8098a8be2 Use `regex` instead of `re` 2017-04-20 02:22:52 +03:00
ines 97647c46cd Add docstring and todo note 2017-04-16 22:14:45 +02:00
ines 5c5f8c0a72 Check if full string is found in lang classes first
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines 1f9f867c70 Remove unused util function 2017-04-16 20:37:45 +02:00
ines ed7e19ad68 Remove unused import 2017-04-16 20:37:45 +02:00
ines 0084466a66 Remove unused utf8open util and replace os.path with ensure_path 2017-04-16 20:37:45 +02:00
ines d10bd0eaf9 Fix formatting 2017-04-16 13:42:34 +02:00
ines 31fa73293a Move read_json out to own util function 2017-04-16 13:03:28 +02:00
Matthew Honnibal e6ee7e130f Fix parse package meta 2017-04-15 13:38:53 +02:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines 956dc36785 Move functions to deprecated 2017-04-15 12:12:31 +02:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 75f9b4c6e2 Fix whitespace 2017-04-07 10:22:18 +02:00
ines fdec758113 Add is_windows and is_python2 utility functions 2017-03-25 14:04:02 +01:00
ines 3f20efe165 Merge branch 'develop'
# Conflicts:
#	spacy/util.py
2017-03-22 17:14:15 +01:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
ines 5aea327a5b Add util function to get raw user input 2017-03-20 22:48:56 +01:00
ines a6c0361803 Handle raw_input vs input in Python 2 and 3 2017-03-20 22:48:32 +01:00
ines adbcac6591 Fix spacing 2017-03-20 22:48:21 +01:00
ines 0eafc0f2c6 Add util functions to print data as table or markdown list 2017-03-18 13:00:14 +01:00
Matthew Honnibal adb0b7e43b Fix loading when no package found 2017-03-16 18:30:23 -05:00
ines 3d484c3faf Don't print in parse_package_meta and accept on_erro callback instead
TODO: log warning for missing meta data in spacy.link, as this affects
the Language class returned by spacy.load()
2017-03-16 20:34:50 +01:00
ines 5f3f04bd0a Add util function to load and parse package meta.json 2017-03-16 17:10:05 +01:00
ines 7f920c2f75 Don't break text in when rendering print_msg 2017-03-16 17:09:50 +01:00
ines 68c04fa897 Move sys_exit() function to util 2017-03-16 17:08:58 +01:00
ines 7b2eca36e4 Revert "Fix formatting and remove unused code"
This reverts commit d7898d586f.
2017-03-16 09:58:41 +01:00
ines f5d1a39a5b Add util functions for printing and wrapping messages 2017-03-15 17:35:57 +01:00
ines d7898d586f Fix formatting and remove unused code 2017-03-15 17:35:41 +01:00
ines 66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Matthew Honnibal 0f9b8a00a5 Unbreak data download 2017-01-09 23:40:26 +01:00
Matthew Honnibal d9a77ddf14 Return None for data path if it doesn't exist 2017-01-09 14:10:05 +01:00
Ines Montani de5aa92bc2 Handle deprecated tokenizer prefix data 2017-01-08 20:33:28 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani 66c7348cda Add update_exc util function 2016-12-08 13:58:12 +01:00
Ines Montani 8e977cc71c Fix formatting 2016-12-08 13:56:17 +01:00
Matthew Honnibal 6b8b05ef83 Specify that spacy.util is encoded in utf8 2016-11-02 19:58:00 +01:00
Matthew Honnibal 9efe568177 Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596 2016-11-02 12:31:34 +01:00
Matthew Honnibal 5e923b9bfa Return None in match_best_version if not path exists. 2016-10-15 14:47:29 +02:00
Matthew Honnibal ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal 95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal 82b8cc5efb Whitespace 2016-09-24 22:17:01 +02:00
Matthew Honnibal f19af6cb2c Python 3 compatible basestring 2016-09-24 22:08:43 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal 83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Daylen Yang 5405e7dd73 Fix get_lang_class parsing (take 2) 2016-05-16 16:40:31 -07:00
Matthew Honnibal b240104f40 Revert "Fix get_lang_class parsing" 2016-05-17 08:04:26 +10:00
Daylen Yang 1692c2df3c Fix get_lang_class parsing
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
2016-05-16 14:38:20 -07:00
Henning Peters ff690f76ba fix loading non-german models 2016-04-12 16:00:56 +02:00
Henning Peters c90d4a6f17 relative imports in __init__.py 2016-03-26 11:44:53 +01:00
Henning Peters b8f63071eb add lang registration facility 2016-03-25 18:54:45 +01:00
Henning Peters a7d7ea3afa first idea for supporting multiple langs in download script 2016-03-24 11:19:43 +01:00
Henning Peters eb7ae61b1c cleanup api 2016-03-08 12:59:18 +01:00
Henning Peters 9cc4f8d5b3 avoid shadowing __name__ 2016-02-15 01:33:39 +01:00
Henning Peters 235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Henning Peters 6d1a3af343 cleanup unused 2016-01-16 10:05:04 +01:00
Henning Peters 846fa49b2a distinct load() and from_package() methods 2016-01-16 10:00:57 +01:00
Henning Peters 211913d689 add about.py, adapt setup.py 2016-01-15 18:57:01 +01:00