Commit Graph

3102 Commits

Author SHA1 Message Date
ines 2edc0aee12 Update warning message 2017-05-08 19:53:36 +02:00
ines 6025cdb992 Fix string interpolation in times 2017-05-08 16:38:16 +02:00
ines b9ba58ba5c Add function to resolve load name
Warn if old 'path' keyword argument is used.
2017-05-08 16:33:37 +02:00
ines e6f1a5d0a1 Add unicode declaration 2017-05-08 16:22:17 +02:00
ines be5541bd16 Fix import and tokenizer exceptions 2017-05-08 16:20:14 +02:00
ines 2324788970 Remove bad tests 2017-05-08 16:15:27 +02:00
ines b88c4193e7 Add missing symbol 2017-05-08 16:15:20 +02:00
ines 9a5b2bdd4c Don't set morph rules without tag map 2017-05-08 16:15:12 +02:00
ines 4930f0fa8f Explicitly import TOKEN_MATCH 2017-05-08 16:11:54 +02:00
ines 50b7ec03ca Fix typo 2017-05-08 16:11:45 +02:00
ines 3ca611fe48 Fix wildcard imports 2017-05-08 15:56:29 +02:00
ines c2469b8135 Remove __all__ export 2017-05-08 15:56:22 +02:00
ines 14a9c3ee7a Fix wildcard import 2017-05-08 15:56:13 +02:00
ines deed623864 Remove comment 2017-05-08 15:56:05 +02:00
ines e7f95c37ee Merge base tokenizer exceptions 2017-05-08 15:55:52 +02:00
ines 24606d364c Remove redundant language_data.py files in languages
Originally intended to collect all components of a language, but just
made things messy. Now each component is in charge of exporting itself
properly.
2017-05-08 15:55:29 +02:00
ines a627d3e3b0 Reorganise Chinese language data 2017-05-08 15:54:36 +02:00
ines 7b86ee093a Reorganise Swedish language data 2017-05-08 15:54:29 +02:00
ines 50510fa947 Reorganise Portuguese language data 2017-05-08 15:52:01 +02:00
ines 279895ea83 Reorganise Dutch language data 2017-05-08 15:51:39 +02:00
ines 04ef5025bd Reorganise Norwegian language data 2017-05-08 15:51:22 +02:00
ines 5edbc725d8 Reorganise Japanese language data 2017-05-08 15:50:46 +02:00
ines 51a389d3bb Reorganise Italian language data 2017-05-08 15:50:17 +02:00
ines 1bbfa14436 Reorganise Hungarian language data 2017-05-08 15:49:56 +02:00
ines a77c9fc60d Reorganise Hebrew language data 2017-05-08 15:49:28 +02:00
ines 7f05e977fa Reorganise French language data 2017-05-08 15:49:05 +02:00
ines 0207ffdd52 Reorganise Finnish language data 2017-05-08 15:48:31 +02:00
ines 8e483ec950 Reorganise Spanish language data 2017-05-08 15:48:04 +02:00
ines c7c21b980f Reorganise English language data 2017-05-08 15:47:25 +02:00
ines 1bf9d5ec8b Reorganise German language data 2017-05-08 15:44:26 +02:00
ines 7b3a983f96 Reorganise Bengali language data 2017-05-08 15:43:50 +02:00
ines 607ba458e7 Fix whitespace 2017-05-08 15:42:31 +02:00
ines 60db497525 Add update_exc and expand_exc to util
Doesn't require separate language data util anymore
2017-05-08 15:42:12 +02:00
Matthew Honnibal b44f7e259c Clean up unused parser code 2017-05-08 15:42:04 +02:00
ines 6e5bd4f228 Remove unused functions from deprecated 2017-05-08 15:40:16 +02:00
Matthew Honnibal 17efb1c001 Change width 2017-05-08 08:40:13 -05:00
ines f68e420bc0 Add PRON_LEMMA and DET_LEMMA to deprecated
Will be replaced with proper values across the language data later.
2017-05-08 15:35:30 +02:00
ines bd6a7cf4f6 Simplify deprecated model downloading
Only relevant for spaCy < v1.7.0.
2017-05-08 15:32:10 +02:00
ines 95edd9e896 Let parse_package_meta take full path 2017-05-08 15:30:48 +02:00
ines 326746eb15 Add util function to resolve arg to model path
1. check if in data dir or shortcut link
2. check if installed as a pip package
3. check if string is path to model
4. check if Path or Path-like object
2017-05-08 15:29:47 +02:00
Matthew Honnibal bef89ef23d Mergery 2017-05-08 08:29:36 -05:00
ines a7801e7342 Update spacy.load()
path argument is now deprecated and name can either take a model name
or path. Implement lazy loading by importing module and read Language
class name off __all__.
2017-05-08 15:27:25 +02:00
Matthew Honnibal 50ddc9fc45 Fix infinite loop bug 2017-05-08 07:54:26 -05:00
Matthew Honnibal 94e86ae00a Predict tags with encoder 2017-05-08 07:53:45 -05:00
Matthew Honnibal 56073a11ef Don't use tags when calculating token vectors 2017-05-08 07:52:24 -05:00
Matthew Honnibal a66a4a4d0f Replace einsums 2017-05-08 14:46:50 +02:00
Matthew Honnibal 8d2eab74da Use PretrainableMaxouts 2017-05-08 14:24:55 +02:00
Matthew Honnibal 807cb2e370 Add PretrainableMaxouts 2017-05-08 14:24:43 +02:00
Matthew Honnibal 2e2268a442 Precomputable hidden now working 2017-05-08 11:36:37 +02:00
ines 94697e9afc Fix typo 2017-05-08 02:00:37 +02:00
ines 0ee2a22b67 Merge branch 'pr/1024' into develop 2017-05-08 01:12:44 +02:00
ines c4492d260a Fix kwargs 2017-05-08 01:05:24 +02:00
Matthew Honnibal 10682d35ab Get pre-computed version working 2017-05-08 00:38:35 +02:00
ines b5a726c5cd Tidy up deprecated.py 2017-05-07 23:29:22 +02:00
ines 59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
ines 311704674d Add path2str compat function 2017-05-07 23:24:56 +02:00
ines e34069db9f Move is_package and get_model_package_path to util 2017-05-07 23:24:51 +02:00
ines 957ba676b4 Add model files base path to about.py 2017-05-07 23:22:35 +02:00
ines 8d8dd9ceb2 Don't set default value for model 2017-05-07 23:22:21 +02:00
Matthew Honnibal 35458987e8 Checkpoint -- nearly finished reimpl 2017-05-07 23:05:01 +02:00
Matthew Honnibal 4441866f55 Checkpoint -- nearly finished reimpl 2017-05-07 22:47:06 +02:00
Matthew Honnibal 6782eedf9b Tmp GPU code 2017-05-07 11:04:24 -05:00
Matthew Honnibal e420e5a809 Tmp 2017-05-07 07:31:09 -05:00
Matthew Honnibal 12039e80ca Switch to single matmul for state layer 2017-05-07 14:26:34 +02:00
Matthew Honnibal 700979fb3c CPU/GPU compat 2017-05-07 04:01:11 +02:00
Matthew Honnibal f99f5b75dc working residual net 2017-05-07 03:57:26 +02:00
Matthew Honnibal bdf2dba9fb WIP on refactor, with hidde pre-computing 2017-05-07 02:02:43 +02:00
Matthew Honnibal b439e04f8d Learning smoothly 2017-05-06 20:38:12 +02:00
Matthew Honnibal 08bee76790 Learns things 2017-05-06 18:24:38 +02:00
Matthew Honnibal 04ae1c01f1 Learns things 2017-05-06 18:21:02 +02:00
Matthew Honnibal bcf4cd0a5f Learns things 2017-05-06 17:37:36 +02:00
Matthew Honnibal 8e48b58cd6 Gradients look correct 2017-05-06 16:47:15 +02:00
Matthew Honnibal 7e04260d38 Data running through, likely errors in model 2017-05-06 14:22:20 +02:00
Matthew Honnibal fa7c1990b6 Restore tok2vec function 2017-05-05 20:12:03 +02:00
Matthew Honnibal efe9630e1c Bug fixes 2017-05-05 20:09:50 +02:00
Matthew Honnibal ef4fa594aa Draft of NN parser, to be tested 2017-05-05 19:20:39 +02:00
Matthew Honnibal 7d1df50aec Draft up Parser model 2017-05-04 13:31:40 +02:00
Matthew Honnibal ccaf26206b Pseudocode for parser 2017-05-04 12:17:59 +02:00
ines b1f22c5a10 Fix formatting 2017-05-03 20:11:02 +02:00
ines a04b5be1b2 Add glossary for annotation scheme (closes #1034)
Can be imported as explain from spacy.glossary, or called as
spacy.explain(term)
2017-05-03 17:02:17 +02:00
Gregory Howard 929f2792a7 Rennaming cls in module. cls is now a class 2017-05-03 15:41:07 +02:00
Gregory Howard 0e8c41ea4f Adding method lemmatizer for every class 2017-05-03 12:14:42 +02:00
Gregory Howard 32ca07989e adding export japanese 2017-05-03 11:07:29 +02:00
Grégory Howard f9d7144224 Merge branch 'master' into master 2017-05-03 11:04:51 +02:00
Gregory Howard f2ab7d77b4 Lazy imports language 2017-05-03 11:01:42 +02:00
Ines Montani 3ea23a3f4d Fix formatting 2017-05-03 09:44:38 +02:00
Ines Montani d730eb0c0d Raise custom ImportError if importing janome fails 2017-05-03 09:43:29 +02:00
Ines Montani 949ad6594b Add newline 2017-05-03 09:38:43 +02:00
Ines Montani d12ca587ea Add newline 2017-05-03 09:38:29 +02:00
Ines Montani 8676cd0135 Add newline 2017-05-03 09:38:07 +02:00
Yasuaki Uechi c8f83aeb87 Add basic japanese support 2017-05-03 13:56:21 +09:00
Gregory Howard c0afcd22bb Merge remote-tracking branch 'remotes/upstream/master' 2017-04-27 14:42:54 +02:00
Matthew Honnibal 31ec9e1371 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-27 13:21:39 +02:00
Matthew Honnibal 2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Gregory Howard 92f368f83b Removing extra spaces 2017-04-27 12:02:14 +02:00
Gregory Howard 13b6957c8e Adding unitest for tokenization in french (with title) 2017-04-27 11:53:44 +02:00
Gregory Howard 8ff4682255 correcting tokenizer exception.
Adding tests for lemmatization
2017-04-27 11:52:14 +02:00
Ines Montani 7da9cefd25 Merge pull request #1022 from luvogels/master
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani c9e592ae6c Add newline 2017-04-27 11:15:41 +02:00
Ines Montani 5942adccc2 Add newline 2017-04-27 11:15:19 +02:00
Ines Montani 4cd9269aef Add newline 2017-04-27 11:15:04 +02:00
Ines Montani ccf13ecc21 Add newline 2017-04-27 11:14:42 +02:00
Ines Montani 03d2b0cc05 Add newline 2017-04-27 11:14:26 +02:00
Gregory Howard 44cb486849 Adding unitest for tokenization in french (with title) 2017-04-27 10:59:38 +02:00
Gregory Howard ad8129cb45 Improvement of rules now title insentive and have same declaration format 2017-04-27 10:23:56 +02:00
luvogels d12a0b6431 Hooked up tokenizer tests 2017-04-26 23:21:41 +02:00
Matthew Honnibal f0e1606d27 Increment version 2017-04-26 20:25:41 +02:00
luvogels b331929a7e Merge branch 'master' of https://github.com/luvogels/spaCy 2017-04-26 19:15:48 +02:00
luvogels 8de59ce3b9 Added tokenizer tests 2017-04-26 19:10:18 +02:00
Matthew Honnibal 4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
Matthew Honnibal 24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang 460094bf09 Update __init__.py 2017-04-26 18:27:55 +02:00
ines 527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Gregory Howard ed5f094451 Adding insensitive lemmatisation test 2017-04-25 18:07:02 +02:00
ghoward 26e31afc18 renamming tests 2017-04-25 17:46:01 +02:00
ghoward c085c2d391 Adding some unitests 2017-04-25 17:44:16 +02:00
ghoward 55c6910f90 Look_up table for languages in spacy.
Need to find an another name for lemmatizerlookup. I was not inspired.
Trying to uses new files in fr language.
2017-04-24 16:39:00 +02:00
Matthew Honnibal c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal 65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00
Matthew Honnibal 70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal 3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
Matthew Honnibal 4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
Matthew Honnibal df2ac8b843 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 21:25:07 +02:00
Matthew Honnibal d0e19267e8 Create directory if missing in save_to_directory 2017-04-23 21:24:43 +02:00
ines 42305bc519 Remove unnecessary test 2017-04-23 21:21:41 +02:00
ines 012ea594d1 Add file for misc tests 2017-04-23 21:06:51 +02:00
ines 83f66947dc Rename test_download to test_cli 2017-04-23 21:06:50 +02:00
ines 401045433c Simplify compat.fix_text 2017-04-23 21:06:50 +02:00
Matthew Honnibal e033c86a64 Increment version 2017-04-23 21:03:43 +02:00
Matthew Honnibal d2436dc17b Update fix for Issue #999 2017-04-23 18:14:37 +02:00
Matthew Honnibal 874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal 60703cede5 Ensure noun chunks can't be nested. Closes #955 2017-04-23 17:56:39 +02:00
Matthew Honnibal c9ec24b257 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 17:07:46 +02:00
Matthew Honnibal 5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal 4d2a659c52 Fix json dump for Python3 2017-04-23 17:05:53 +02:00
Matthew Honnibal 040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
ines 3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal 1b12f342e4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-20 17:03:11 +02:00
Matthew Honnibal 4eef200bab Persist the actions within spacy.parser.cfg 2017-04-20 17:02:44 +02:00
ines 25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Ines Montani 60b5243bee Merge pull request #1002 from oroszgy/model_cli_fix
Fixes for the `model` CLI
2017-04-20 15:41:03 +02:00
Gyorgy Orosz 4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
Ines Montani 3800b29046 Merge pull request #1001 from recognai/master
Add SPACE to es tag map
2017-04-20 12:16:34 +02:00
oeg f0bcd0babb fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing 2017-04-20 11:36:24 +02:00
Ben Eyal e90e8a3f10 Enable test 2017-04-20 02:25:24 +03:00
Ben Eyal 33af52599e Redefine alphabetic characters
For caseless languages (Hebrew, Bengali) all characters are both lowercase and uppercase.
2017-04-20 02:25:02 +03:00
Ben Eyal d8098a8be2 Use `regex` instead of `re` 2017-04-20 02:22:52 +03:00
oeg daaa42dd25 Merge remote-tracking branch 'upstream/master' 2017-04-19 23:30:36 +02:00
oeg 936a297241 fix(model): Fix tag map for fixing issues with tag SPACE 2017-04-19 23:30:21 +02:00
luvogels c7cec7e5e2 Update __init__.py 2017-04-19 21:06:30 +02:00
luvogels 55e8cade36 Update __init__.py 2017-04-19 21:06:30 +02:00
luvogels 03abd0c8e6 Update __init__.py 2017-04-19 21:06:30 +02:00
Leif Uwe Vogelsang 538a8d6b12 Resolved merge conflict by incorporating both suggestions. 2017-04-19 21:06:07 +02:00
Leif Uwe Vogelsang e821c48489 Norwegian language basics 2017-04-19 21:04:01 +02:00
Leif Uwe Vogelsang 3796c668d9 more norwegian 2017-04-19 21:01:32 +02:00
Leif Uwe Vogelsang bc9557b21f Norwegian language basics 2017-04-19 21:00:01 +02:00
ines 2bd89e7ade Tidy up Hebrew tests and test for punctuation (see #995) 2017-04-19 19:28:03 +02:00
ines 48da244058 Use spacy.compat.json_dumps for Python 2/3 compatibility (resolves #991) 2017-04-19 11:50:36 +02:00
ines ddd5194088 Update Language docs and docstrings 2017-04-17 01:52:13 +02:00
ines f62b740961 Use compat.json_dumps 2017-04-17 01:46:14 +02:00
ines 8e83f8e2fa Update docstrings 2017-04-17 01:40:26 +02:00
ines e2299dc389 Ensure path in save_to_directory 2017-04-17 01:40:14 +02:00
ines 82f5f1f98f Replace str with compat.unicode_ 2017-04-17 01:29:54 +02:00
ines 16a8521efa Increment version 2017-04-16 22:38:38 +02:00
Matthew Honnibal 4efd6fb9d6 Fix training 2017-04-16 15:28:27 -05:00
Matthew Honnibal 17c9fffb9e Fix naked except 2017-04-16 15:28:16 -05:00
ines 5610fdcc06 Get language name first if no model path exists
Makes sure spaCy fails early if no tokenizer exists, and allows
printing better error message.
2017-04-16 22:16:47 +02:00
ines ad168ba88c Set model name to empty string if path override exists
Required for parse_package_meta, which composes path of data_path and
model_name (needs to be fixed in the future)
2017-04-16 22:15:51 +02:00
ines 97647c46cd Add docstring and todo note 2017-04-16 22:14:45 +02:00
ines 5c5f8c0a72 Check if full string is found in lang classes first
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines 13d30b6c01 xfail lemmatizer test that's causing problems (see #546) 2017-04-16 21:18:39 +02:00
Matthew Honnibal 4931c56afc Increment version 2017-04-16 13:59:38 -05:00
ines 6145b7c153 Remove redundant Path 2017-04-16 20:53:25 +02:00
Matthew Honnibal fa89613444 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-16 13:42:56 -05:00
ines 1f9f867c70 Remove unused util function 2017-04-16 20:37:45 +02:00
ines 7670c745b6 Update spacy.load() and fix path checks 2017-04-16 20:37:45 +02:00
ines d3759dfb32 Fix docstring 2017-04-16 20:37:45 +02:00
ines ed7e19ad68 Remove unused import 2017-04-16 20:37:45 +02:00
ines 0084466a66 Remove unused utf8open util and replace os.path with ensure_path 2017-04-16 20:37:45 +02:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
Matthew Honnibal 6a4221a6de Allow lemma to be set from Python. Re #973 2017-04-16 18:07:53 +02:00
Matthew Honnibal 137b210bcf Restore use of FTRL training 2017-04-16 18:02:42 +02:00
ines d10bd0eaf9 Fix formatting 2017-04-16 13:42:34 +02:00
ines 8191e33cf1 Update link error message with info on permissions 2017-04-16 13:32:31 +02:00
ines a3ddbc0444 Add note about --force flag to error message 2017-04-16 13:14:36 +02:00
ines e3de035814 Add meta validation to check for required settings
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines a7574b7572 Add more options to read in meta data in package command
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines 13c8a42d2b Fix typos 2017-04-16 13:03:58 +02:00
ines 31fa73293a Move read_json out to own util function 2017-04-16 13:03:28 +02:00
Matthew Honnibal 45464d065e Remove print statement 2017-04-15 16:11:43 +02:00
Matthew Honnibal c76cb8af35 Fix training for new labels 2017-04-15 16:11:26 +02:00
Matthew Honnibal 4884b2c113 Refix StepwiseState 2017-04-15 16:00:28 +02:00
Matthew Honnibal e6ee7e130f Fix parse package meta 2017-04-15 13:38:53 +02:00
Matthew Honnibal 1a98e48b8e Fix Stepwisestate' 2017-04-15 13:35:01 +02:00
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
ines fefe6684cd Fix symlink function to check for Windows 2017-04-15 12:17:27 +02:00
ines 35fb4febe2 Fix whitespace 2017-04-15 12:13:45 +02:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines 958b12dec8 Use pathlib instead of os.path 2017-04-15 12:13:00 +02:00
ines 956dc36785 Move functions to deprecated 2017-04-15 12:12:31 +02:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines 26445ee304 Add compat module for Python2/3 and platform compatibility 2017-04-15 12:07:02 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal d13f0a7017 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-04-14 23:54:57 +02:00
Matthew Honnibal 354458484c WIP on add_label bug during NER training
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal 33ba5066eb Refactor Language.end_training, making new save_to_directory method 2017-04-14 23:51:24 +02:00
ines 84341c2975 Only compile list of models if data_path exists 2017-04-14 16:48:02 +02:00
Gyorgy Orosz dd3244c08a Made json dump to produce unicode strings in py2 2017-04-13 23:30:47 +02:00
Gyorgy Orosz a9469c8173 Fixed typo 2017-04-13 15:24:14 +02:00
ines 41037f0f07 Remove unused imports 2017-04-13 13:52:11 +02:00
ines 1b92c8d5d5 Use unicode paths on Windows/Python 2 and catch other errors (resolves #970)
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
Matthew Honnibal 49e2de900e Add costs property to StepwiseState, to show which moves are gold. 2017-04-10 11:37:04 +02:00
Matthew Honnibal e26577b202 Increment version 2017-04-07 18:45:06 +02:00
Matthew Honnibal 40bf7ecf27 Increment version 2017-04-07 18:44:20 +02:00
Matthew Honnibal 1dca7eeb03 Add unicode declaration on new regression test 2017-04-07 18:09:23 +02:00
ines 887827fc6a Merge branch 'develop' 2017-04-07 17:36:23 +02:00
ines 444dd511c5 Fix xpassing URL test case 2017-04-07 17:36:05 +02:00
ines bf0f15e762 Add / to tokenizer infixes (resolves #891) 2017-04-07 17:30:44 +02:00
ines 00b9011a49 Fix whitespace 2017-04-07 17:29:59 +02:00
ines f9869e4dc5 Merge branch 'master' into develop 2017-04-07 17:23:40 +02:00
Matthew Honnibal 4a6204dbad Merge remote-tracking branch 'origin/develop' 2017-04-07 17:20:09 +02:00
Matthew Honnibal 0513c43bf0 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-07 17:07:10 +02:00
Matthew Honnibal cc36c308f4 Fix noun_chunk rules around coordination
Closes #693.
2017-04-07 17:06:40 +02:00
Matthew Honnibal ab846256cf Merge pull request #966 from recognai/master
Prepare Spanish language for training models, including configuration, rich-UD tag map and tests
2017-04-07 16:12:29 +02:00
Matthew Honnibal 83dca920d4 Rename test #913 -> #957, comment
Make test for #957 reference correct bug. Add comment.

Previous commit closes #957.
2017-04-07 15:54:25 +02:00
Matthew Honnibal be204ed714 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-07 15:50:14 +02:00
Matthew Honnibal e7b1ee9efd Switch to regex module for URL identification
The URL detection regex was failing on input such as 0.1.2.3, as this
input triggered excessive back-tracking in the builtin re module.
The solution was to switch to the regex module, which behaves better.

Closes #913.
2017-04-07 15:47:36 +02:00
Matthew Honnibal 5887383fc0 Add test for Issue #913: Hang from bad regex 2017-04-07 15:47:27 +02:00
ines 7ea1673072 Fix whitespace 2017-04-07 13:28:48 +02:00
ines 255650dbc2 Add connlu2json converter from explosion/spacy-dev-resources/#11 2017-04-07 13:05:12 +02:00
ines 789ce8a45e Add convert command 2017-04-07 13:04:17 +02:00
ines 9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
ines 47ddce6eb7 Remove unused variable 2017-04-07 13:01:48 +02:00
ines dcf8ab0c47 Merge branch 'develop' 2017-04-07 12:00:09 +02:00
ines 75f9b4c6e2 Fix whitespace 2017-04-07 10:22:18 +02:00
oeg c693d40791 feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests 2017-04-06 18:48:45 +02:00
oeg 010293fb2f fix(typo): Fixes typo in method calling PseudoProjectivity.deprojectivize, failing with new train cli 2017-04-06 17:33:15 +02:00
ines 808cd6cf7f Add missing tags to verbs (resolves #948) 2017-04-03 18:12:52 +02:00
ines ad8bf1829f Import and combine Portuguese tokenizer exceptions (see #943) 2017-04-01 10:37:42 +02:00
Ines Montani f8b2d9c3b7 Merge pull request #943 from mamoit/master
Portuguese improvements
2017-04-01 10:32:00 +02:00
ines 3b667a24d4 Remove whitespace 2017-04-01 10:21:08 +02:00
ines e71a1f4bd0 Fix download commands in error messages (see #946) 2017-04-01 10:20:57 +02:00
ines 42382d5692 Fix download commands in error messages (see #946) 2017-04-01 10:19:32 +02:00
ines d4a59c254b Remove whitespace 2017-04-01 10:19:01 +02:00
Matthew Honnibal 51882ee2b8 Fix check for setting ent_id in merge 2017-03-31 19:32:01 +02:00
Miguel Almeida 4fde64c4ea Portuguese contractions and some abreviations 2017-03-31 15:52:55 +01:00
Miguel Almeida 465b240bcb Review Portuguese stop words
Mainly to review typos and add missing masculines/feminines
2017-03-31 13:00:47 +01:00
Matthew Honnibal fc3900e5b2 Allow ent_id to be set in Token 2017-03-31 14:00:14 +02:00
Matthew Honnibal 9720103428 Improve attribute handlign in doc.merge(). Still unsatisfying 2017-03-31 13:59:58 +02:00