Commit Graph

6112 Commits

Author SHA1 Message Date
Matthew Honnibal 89c92c65fb Update version 2019-07-28 17:56:38 +02:00
Matthew Honnibal 06eb428ed1 Make pipe base class a bit less presumptuous 2019-07-28 17:56:11 +02:00
Matthew Honnibal 16b5144095 Don't raise NotImplemented in Pipe.update 2019-07-28 17:54:11 +02:00
Ines Montani fc69da0acb
💫 Support simple training format in nlp.evaluate and add tests (#4033)
* Support simple training format in nlp.evaluate and add tests

* Update docs [ci skip]
2019-07-27 17:30:18 +02:00
Ines Montani a3723f439c Fix formatting [ci skip] 2019-07-27 16:35:42 +02:00
Ines Montani d5bce35fb1 Fix bug in Span.similarity when called via hook 2019-07-27 15:33:27 +02:00
Ines Montani 109b5e1798 Fix bug in Token.similarity when called via hook 2019-07-27 15:26:01 +02:00
Ines Montani e000b5ed82 Also support "requirements" in model.json 2019-07-27 13:34:57 +02:00
Ines Montani 307ffe472d
Support custom language factory setting in meta.json (#4031) 2019-07-27 13:17:43 +02:00
Bae Yong-Ju 05fbf5d976 Fix error when Korean text contains regexp special characters. (#4022) 2019-07-25 17:53:33 +02:00
Matthew Honnibal 73e095923f 💫 Improve error message when model.from_bytes() dies (#4014)
* Improve error message when model.from_bytes() dies

When Thinc's model.from_bytes() is called with a mismatched model, often
we get a particularly ungraceful error,

e.g. "AttributeError: FunctionLayer has no attribute G"

This is because we're trying to load the parameters for something like
a LayerNorm layer, and the model architecture has some other layer there
instead. This is obviously terrible, especially since the error *type*
is wrong.

I've changed it to raise a ValueError. The error message is still
probably a bit terse, but it's hard to be sure exactly what's gone
wrong.

* Update spacy/pipeline/pipes.pyx

* Update spacy/pipeline/pipes.pyx

* Update spacy/pipeline/pipes.pyx

* Update spacy/syntax/nn_parser.pyx

* Update spacy/syntax/nn_parser.pyx

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>


Co-authored-by: Ines Montani <ines@ines.io>
2019-07-24 11:27:34 +02:00
Ines Montani 87fcf3141c
Merge pull request #4003 from svlandeg/feature/nel-fixes
API changes for Entity linking functionality
2019-07-23 23:17:07 +02:00
Paul O'Leary McCann c8949ce88a Remove old comment (#4012)
Norwegian used to borrow from French but that doesn't appear to have
been true for a while now, so the comment that was here is no longer
relevant.
2019-07-23 23:10:06 +02:00
Sofie Van Landeghem ba02957c80 Fix dependency copy for as_doc (#3969)
* failing unit test for issue 3962

* attempt to fix Issue #3962

* create artificial unit test example

* using length instead of self.length

* sp

* reformat with black

* find better ancestor within span and use generic 'dep'

* attach to span.root if there is no appropriate ancestor

* comment span text

* clean up ancestor code

* reconstruct dep tree to keep same number of sentences
2019-07-23 18:28:54 +02:00
svlandeg 4e7ec1ed31 return fix 2019-07-23 14:23:58 +02:00
svlandeg 400ff342cf replace assert's with custom error messages 2019-07-23 11:52:48 +02:00
svlandeg 20389e4553 format and bugfix 2019-07-22 15:08:17 +02:00
svlandeg b1911f7105 Errors.E146 for IO error when FP is null 2019-07-22 14:56:13 +02:00
svlandeg 5d544f89ba Errors.E145 for IO errors when reading KB 2019-07-22 14:36:07 +02:00
Ines Montani a32b033b8c Add regression test for #4002
Test that the PhraseMatcher can match on overwritten NORM attributes.
2019-07-22 14:18:24 +02:00
svlandeg ad65171837 Merge remote-tracking branch 'upstream/master' into feature/nel-fixes 2019-07-22 13:41:28 +02:00
svlandeg 76184374e2 test corner cases 2019-07-22 13:39:32 +02:00
svlandeg 9f8c1e71a2 fix for Issue #4000 2019-07-22 13:34:12 +02:00
svlandeg dae8a21282 rename entity frequency 2019-07-19 17:40:28 +02:00
svlandeg 41fb5204ba output tensors as part of predict 2019-07-19 14:47:36 +02:00
svlandeg 21176517a7 have gold.links correspond exactly to doc.ents 2019-07-19 12:36:15 +02:00
BreakBB 3e370cf2ba Add 'Prof.' to Englisch tokenizer_exceptions 2019-07-19 10:00:45 +02:00
svlandeg e1213eaf6a use original gold object in get_loss function 2019-07-18 13:35:10 +02:00
svlandeg ec55d2fccd filter training data beforehand (+black formatting) 2019-07-18 10:22:24 +02:00
Falak Asad ff1e73e35c Bugfix/issue 3968 (#3982)
* Fix for issue-3968

* Added contributor agreement

* Made suggested changes
2019-07-18 00:20:32 +02:00
svlandeg d833d4c358 fixes in kb and gold 2019-07-17 17:18:26 +02:00
Ines Montani 73565c6d9d Rename function arguments 2019-07-17 14:29:52 +02:00
Matthew Honnibal 394e4d8058 Add docstring for spacy.gold.align 2019-07-17 13:59:17 +02:00
Ines Montani 073013f129 Auto-format [ci skip] 2019-07-17 12:34:13 +02:00
svlandeg 4086c6ff60 get vector functionality + unit test 2019-07-17 12:17:02 +02:00
Ines Montani 62ff128888 Add regression test for #3951 2019-07-16 14:00:00 +02:00
Ines Montani 7f551050b1 Add regression test for #3972 2019-07-16 13:07:35 +02:00
svlandeg a63d15a142 code cleanup 2019-07-15 17:36:43 +02:00
svlandeg cdc589d344 small fix 2019-07-15 12:04:45 +02:00
svlandeg 60f299374f set default context width 2019-07-15 12:03:09 +02:00
svlandeg 6e809e9b8b proper error for missing cfg arguments 2019-07-15 11:42:50 +02:00
svlandeg 6026958957 tokenizer doc fix 2019-07-15 11:19:34 +02:00
Ines Montani c0e29f7029
Merge pull request #3957 from sorenlind/danish-tokenizer-slash
Make Danish tokenizer split on forward slash
2019-07-12 18:19:22 +02:00
Matthew Honnibal ef666656b3 Fix attrs alignment 2019-07-12 17:59:47 +02:00
Matthew Honnibal c345c042b0 Fix symbol alignment 2019-07-12 17:48:38 +02:00
Ines Montani 7281026879 Increment version [ci skip] 2019-07-12 17:40:00 +02:00
Søren Lind Kristiansen 26aee70d95 Make Danish tokenizer split on forward slash 2019-07-12 15:20:42 +02:00
Matthew Honnibal 3bc4d618f9 Set version to v2.1.5 2019-07-12 13:26:12 +02:00
Sofie Van Landeghem ed774cb953 Fixing ngram bug (#3953)
* minimal failing example for Issue #3661

* referenced Issue #3661 instead of Issue #3611

* cleanup
2019-07-12 10:01:35 +02:00
Matthew Honnibal 09dc01a426 Fix #3853, and add warning 2019-07-11 14:46:47 +02:00