Ben Eyal
d8098a8be2
Use `regex` instead of `re`
2017-04-20 02:22:52 +03:00
ines
97647c46cd
Add docstring and todo note
2017-04-16 22:14:45 +02:00
ines
5c5f8c0a72
Check if full string is found in lang classes first
...
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines
1f9f867c70
Remove unused util function
2017-04-16 20:37:45 +02:00
ines
ed7e19ad68
Remove unused import
2017-04-16 20:37:45 +02:00
ines
0084466a66
Remove unused utf8open util and replace os.path with ensure_path
2017-04-16 20:37:45 +02:00
ines
d10bd0eaf9
Fix formatting
2017-04-16 13:42:34 +02:00
ines
31fa73293a
Move read_json out to own util function
2017-04-16 13:03:28 +02:00
Matthew Honnibal
e6ee7e130f
Fix parse package meta
2017-04-15 13:38:53 +02:00
ines
e1efd589c3
Fix json imports and use ujson
2017-04-15 12:13:34 +02:00
ines
956dc36785
Move functions to deprecated
2017-04-15 12:12:31 +02:00
ines
c05ec4b89a
Add compat functions and remove old workarounds
...
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
75f9b4c6e2
Fix whitespace
2017-04-07 10:22:18 +02:00
ines
fdec758113
Add is_windows and is_python2 utility functions
2017-03-25 14:04:02 +01:00
ines
3f20efe165
Merge branch 'develop'
...
# Conflicts:
# spacy/util.py
2017-03-22 17:14:15 +01:00
Raphaël Bournhonesque
f332bf05be
Remove unused import statements
2017-03-21 21:08:54 +01:00
ines
5aea327a5b
Add util function to get raw user input
2017-03-20 22:48:56 +01:00
ines
a6c0361803
Handle raw_input vs input in Python 2 and 3
2017-03-20 22:48:32 +01:00
ines
adbcac6591
Fix spacing
2017-03-20 22:48:21 +01:00
ines
0eafc0f2c6
Add util functions to print data as table or markdown list
2017-03-18 13:00:14 +01:00
Matthew Honnibal
adb0b7e43b
Fix loading when no package found
2017-03-16 18:30:23 -05:00
ines
3d484c3faf
Don't print in parse_package_meta and accept on_erro callback instead
...
TODO: log warning for missing meta data in spacy.link, as this affects
the Language class returned by spacy.load()
2017-03-16 20:34:50 +01:00
ines
5f3f04bd0a
Add util function to load and parse package meta.json
2017-03-16 17:10:05 +01:00
ines
7f920c2f75
Don't break text in when rendering print_msg
2017-03-16 17:09:50 +01:00
ines
68c04fa897
Move sys_exit() function to util
2017-03-16 17:08:58 +01:00
ines
7b2eca36e4
Revert "Fix formatting and remove unused code"
...
This reverts commit d7898d586f
.
2017-03-16 09:58:41 +01:00
ines
f5d1a39a5b
Add util functions for printing and wrapping messages
2017-03-15 17:35:57 +01:00
ines
d7898d586f
Fix formatting and remove unused code
2017-03-15 17:35:41 +01:00
ines
66c1f194f9
Use consistent unicode declarations
2017-03-12 13:07:28 +01:00
Matthew Honnibal
0f9b8a00a5
Unbreak data download
2017-01-09 23:40:26 +01:00
Matthew Honnibal
d9a77ddf14
Return None for data path if it doesn't exist
2017-01-09 14:10:05 +01:00
Ines Montani
de5aa92bc2
Handle deprecated tokenizer prefix data
2017-01-08 20:33:28 +01:00
Ines Montani
6a60a61086
Move update_exc to global language data utils
2016-12-17 12:29:02 +01:00
Ines Montani
66c7348cda
Add update_exc util function
2016-12-08 13:58:12 +01:00
Ines Montani
8e977cc71c
Fix formatting
2016-12-08 13:56:17 +01:00
Matthew Honnibal
6b8b05ef83
Specify that spacy.util is encoded in utf8
2016-11-02 19:58:00 +01:00
Matthew Honnibal
9efe568177
Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596
2016-11-02 12:31:34 +01:00
Matthew Honnibal
5e923b9bfa
Return None in match_best_version if not path exists.
2016-10-15 14:47:29 +02:00
Matthew Honnibal
ea23b64cc8
Refactor training, with new spacy.train module. Defaults still a little awkward.
2016-10-09 12:24:24 +02:00
Matthew Honnibal
95aaea0d3f
Refactor so that the tokenizer data is read from Python data, rather than from disk
2016-09-25 14:49:53 +02:00
Matthew Honnibal
82b8cc5efb
Whitespace
2016-09-24 22:17:01 +02:00
Matthew Honnibal
f19af6cb2c
Python 3 compatible basestring
2016-09-24 22:08:43 +02:00
Matthew Honnibal
fd65cf6cbb
Finish refactoring data loading
2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
2016-09-24 15:42:01 +02:00
Daylen Yang
5405e7dd73
Fix get_lang_class parsing (take 2)
2016-05-16 16:40:31 -07:00
Matthew Honnibal
b240104f40
Revert "Fix get_lang_class parsing"
2016-05-17 08:04:26 +10:00
Daylen Yang
1692c2df3c
Fix get_lang_class parsing
...
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
2016-05-16 14:38:20 -07:00
Henning Peters
ff690f76ba
fix loading non-german models
2016-04-12 16:00:56 +02:00
Henning Peters
c90d4a6f17
relative imports in __init__.py
2016-03-26 11:44:53 +01:00