Commit Graph

6075 Commits

Author SHA1 Message Date
Ines Montani 73565c6d9d Rename function arguments 2019-07-17 14:29:52 +02:00
Matthew Honnibal 394e4d8058 Add docstring for spacy.gold.align 2019-07-17 13:59:17 +02:00
Ines Montani 073013f129 Auto-format [ci skip] 2019-07-17 12:34:13 +02:00
Ines Montani 62ff128888 Add regression test for #3951 2019-07-16 14:00:00 +02:00
Ines Montani 7f551050b1 Add regression test for #3972 2019-07-16 13:07:35 +02:00
Ines Montani c0e29f7029
Merge pull request #3957 from sorenlind/danish-tokenizer-slash
Make Danish tokenizer split on forward slash
2019-07-12 18:19:22 +02:00
Matthew Honnibal ef666656b3 Fix attrs alignment 2019-07-12 17:59:47 +02:00
Matthew Honnibal c345c042b0 Fix symbol alignment 2019-07-12 17:48:38 +02:00
Ines Montani 7281026879 Increment version [ci skip] 2019-07-12 17:40:00 +02:00
Søren Lind Kristiansen 26aee70d95 Make Danish tokenizer split on forward slash 2019-07-12 15:20:42 +02:00
Matthew Honnibal 3bc4d618f9 Set version to v2.1.5 2019-07-12 13:26:12 +02:00
Sofie Van Landeghem ed774cb953 Fixing ngram bug (#3953)
* minimal failing example for Issue #3661

* referenced Issue #3661 instead of Issue #3611

* cleanup
2019-07-12 10:01:35 +02:00
Matthew Honnibal 09dc01a426 Fix #3853, and add warning 2019-07-11 14:46:47 +02:00
Matthew Honnibal 7369949d2e Add warning for #3853 2019-07-11 14:46:47 +02:00
Ines Montani 673c864a06
Fix doc.count_by functionality (#3950)
Fix doc.count_by functionality
2019-07-11 13:44:00 +02:00
Ines Montani 2426f4d44c
Fix default punctuation rules for splitting Hindi text (#3948)
Fix default punctuation rules for splitting Hindi text

Co-authored-by: yash <patadiayash@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2019-07-11 13:36:28 +02:00
svlandeg 349107daa3 cleanup 2019-07-11 13:09:22 +02:00
svlandeg 0f0f07318a counter instead of preshcounter 2019-07-11 13:05:53 +02:00
Matthew Honnibal b40b4c2c31
💫 Fix issue #3839: Incorrect entity IDs from Matcher with operators (#3949)
* Add regression test for issue #3541

* Add comment on bugfix

* Remove incorrect test

* Un-xfail test
2019-07-11 12:55:11 +02:00
Matthew Honnibal e19f4ee719 Add warning message re Issue #3853 2019-07-11 12:50:38 +02:00
Ines Montani 197cfd7ebc Merge branch 'master' into pr/3948 2019-07-11 12:18:31 +02:00
Ines Montani d166756607 Fix test 2019-07-11 12:16:43 +02:00
Ines Montani 0b8406a05c Tidy up and auto-format 2019-07-11 12:02:25 +02:00
yash 6751af3e78 Merge branch 'master' of https://github.com/yash1994/spaCy 2019-07-11 15:26:57 +05:30
yash ae2d52e323 Add default encoding utf-8 for test file 2019-07-11 15:26:27 +05:30
Ines Montani 33ca0a036a Merge branch 'master' into pr/3948 2019-07-11 11:55:54 +02:00
Matthew Honnibal 0491a8e7c8 Reformat 2019-07-11 11:49:36 +02:00
Matthew Honnibal bd3c3f342b Fix _serialize 2019-07-11 11:48:55 +02:00
yash 815f8d13dd Fix default punctuation rules for hindi text (#3625 explosion) 2019-07-11 15:00:51 +05:30
yash d5311b3c42 Add test file for issue (#3625) and spacy contributor agreement 2019-07-11 14:53:14 +05:30
svlandeg e080412385 tracked the bug down to PreshCounter.inc - still unclear what goes wrong 2019-07-11 01:53:06 +02:00
svlandeg a89fecce97 failing unit test for issue #3869 2019-07-11 00:43:55 +02:00
Matthew Honnibal a388888074 Merge branch 'master' of https://github.com/explosion/spaCy 2019-07-10 22:54:17 +02:00
Matthew Honnibal c6cb782758 Set version to 2.1.5.dev0 2019-07-10 22:54:09 +02:00
Sofie Van Landeghem c4c21cb428 more friendly textcat errors (#3946)
* more friendly textcat errors with require_model and require_labels

* update thinc version with recent bugfix
2019-07-10 19:39:38 +02:00
Matthew Honnibal b94c5443d9 Rename Binder->DocBox, and improve it. 2019-07-10 19:37:20 +02:00
Matthew Honnibal 3d18600c05 Return True from doc.is_... when no ambiguity
* Make doc.is_sentenced return True if len(doc) < 2.

* Make doc.is_nered return True if len(doc) == 0, for consistency.

Closes #3934
2019-07-10 19:21:42 +02:00
Matthew Honnibal 465456edb9 Un-xfail test #3880 2019-07-10 14:01:17 +02:00
Matthew Honnibal 87f7ec34d5 Add test for #3880 2019-07-10 13:53:55 +02:00
Ines Montani 4e04080b76 Only compare sorted patterns in test
Try to work around flaky tests on Python 3.5
2019-07-10 13:00:52 +02:00
Ines Montani 82045aac8a Merge regression tests 2019-07-10 12:49:18 +02:00
Ines Montani 40cd03fc35 Improve EntityRuler serialization 2019-07-10 12:25:45 +02:00
Ines Montani 570ab1f481 Fix handling of old entity ruler files
Expected an `entity_ruler.jsonl` file in the top-level model directory, so the path passed to from_disk by default (model path plus componentn name), but with the suffix ".jsonl".
2019-07-10 12:14:12 +02:00
Ines Montani 874d914a44 Tidy up test 2019-07-10 12:13:23 +02:00
Ines Montani ea2050079b Auto-format 2019-07-10 12:03:05 +02:00
Ines Montani 6ba5ddbd5f
Merge pull request #3864 from svlandeg/feature/nel-wiki
Entity linking using Wikipedia & Wikidata
2019-07-10 11:25:41 +02:00
Ines Montani 8721849423 Update Scorer.ents_per_type 2019-07-10 11:19:28 +02:00
Björn Böing 205c73a589 Update tokenizer and doc init example (#3939)
* Fix Doc.to_json hyperlink

* Update tokenizer and doc init examples

* Change "matchin rules" to "punctuation rules"

* Auto-format
2019-07-10 10:16:48 +02:00
cedar101 58f06e6180 Korean support (#3901)
* start lang/ko

* add test codes

* using natto-py

* add test_ko_tokenizer_full_tags()

* spaCy contributor agreement

* external dependency for ko

* collections.namedtuple for python version < 3.5

* case fix

* tuple unpacking

* add jongseong(final consonant)

* apply mecab option

* Remove Pipfile for now


Co-authored-by: Ines Montani <ines@ines.io>
2019-07-09 22:23:16 +02:00
Ines Montani f2ea3e3ea2
Merge branch 'master' into feature/nel-wiki 2019-07-09 21:57:47 +02:00