Commit Graph

2800 Commits

Author SHA1 Message Date
Ines Montani 949ad6594b Add newline 2017-05-03 09:38:43 +02:00
Ines Montani d12ca587ea Add newline 2017-05-03 09:38:29 +02:00
Ines Montani 8676cd0135 Add newline 2017-05-03 09:38:07 +02:00
Yasuaki Uechi c8f83aeb87 Add basic japanese support 2017-05-03 13:56:21 +09:00
Matthew Honnibal 31ec9e1371 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-27 13:21:39 +02:00
Matthew Honnibal 2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Ines Montani 7da9cefd25 Merge pull request #1022 from luvogels/master
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani c9e592ae6c Add newline 2017-04-27 11:15:41 +02:00
Ines Montani 5942adccc2 Add newline 2017-04-27 11:15:19 +02:00
Ines Montani 4cd9269aef Add newline 2017-04-27 11:15:04 +02:00
Ines Montani ccf13ecc21 Add newline 2017-04-27 11:14:42 +02:00
Ines Montani 03d2b0cc05 Add newline 2017-04-27 11:14:26 +02:00
luvogels d12a0b6431 Hooked up tokenizer tests 2017-04-26 23:21:41 +02:00
Matthew Honnibal f0e1606d27 Increment version 2017-04-26 20:25:41 +02:00
luvogels b331929a7e Merge branch 'master' of https://github.com/luvogels/spaCy 2017-04-26 19:15:48 +02:00
luvogels 8de59ce3b9 Added tokenizer tests 2017-04-26 19:10:18 +02:00
Matthew Honnibal 4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
Matthew Honnibal 24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang 460094bf09 Update __init__.py 2017-04-26 18:27:55 +02:00
ines 527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Matthew Honnibal c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal 65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00
Matthew Honnibal 70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal 3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
Matthew Honnibal 4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
Matthew Honnibal df2ac8b843 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 21:25:07 +02:00
Matthew Honnibal d0e19267e8 Create directory if missing in save_to_directory 2017-04-23 21:24:43 +02:00
ines 42305bc519 Remove unnecessary test 2017-04-23 21:21:41 +02:00
ines 012ea594d1 Add file for misc tests 2017-04-23 21:06:51 +02:00
ines 83f66947dc Rename test_download to test_cli 2017-04-23 21:06:50 +02:00
ines 401045433c Simplify compat.fix_text 2017-04-23 21:06:50 +02:00
Matthew Honnibal e033c86a64 Increment version 2017-04-23 21:03:43 +02:00
Matthew Honnibal d2436dc17b Update fix for Issue #999 2017-04-23 18:14:37 +02:00
Matthew Honnibal 874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal 60703cede5 Ensure noun chunks can't be nested. Closes #955 2017-04-23 17:56:39 +02:00
Matthew Honnibal c9ec24b257 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 17:07:46 +02:00
Matthew Honnibal 5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal 4d2a659c52 Fix json dump for Python3 2017-04-23 17:05:53 +02:00
Matthew Honnibal 040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
ines 3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal 1b12f342e4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-20 17:03:11 +02:00
Matthew Honnibal 4eef200bab Persist the actions within spacy.parser.cfg 2017-04-20 17:02:44 +02:00
ines 25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Ines Montani 60b5243bee Merge pull request #1002 from oroszgy/model_cli_fix
Fixes for the `model` CLI
2017-04-20 15:41:03 +02:00
Gyorgy Orosz 4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
Ines Montani 3800b29046 Merge pull request #1001 from recognai/master
Add SPACE to es tag map
2017-04-20 12:16:34 +02:00
oeg f0bcd0babb fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing 2017-04-20 11:36:24 +02:00
Ben Eyal e90e8a3f10 Enable test 2017-04-20 02:25:24 +03:00
Ben Eyal 33af52599e Redefine alphabetic characters
For caseless languages (Hebrew, Bengali) all characters are both lowercase and uppercase.
2017-04-20 02:25:02 +03:00
Ben Eyal d8098a8be2 Use `regex` instead of `re` 2017-04-20 02:22:52 +03:00