Commit Graph

8591 Commits

Author SHA1 Message Date
ines 3c80f69ff5 Return data in cli.info and add silent option (resolves #2196) 2018-04-29 01:59:44 +02:00
ines 1c6d77610c Add remove_extension method on Doc, Token and Span (closes #2242) 2018-04-28 23:33:09 +02:00
ines a512fa60ef Remove upcoming option from docs for now 2018-04-28 23:32:18 +02:00
ines abdb853ebf Simplify underscore tests 2018-04-28 23:30:33 +02:00
ines 6fb6371670 Add collapse_phrases option to displacy (closes #2266) 2018-04-28 23:06:50 +02:00
Matt Upson 87cc6b3599 Add missing comma to NN example in docs (#2255)
Also add a completed contributor agreement.
2018-04-28 14:56:00 +02:00
Robin Linderborg 1f9904ef12 fixes #2238 (#2241)
* Remove erroneous lemma lookup år > åra in Swedish

* Add contributors agreement

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:55:22 +02:00
Robin Linderborg d01f503b54 Remove incorrect lemma lookup gäng->gänga (#2252)
* Remove incorrect lemma lookup gäng->gänga
In modern Swedish, "gäng" is mostly associated with "gang" or "group of people". The removed lemma lookup lemmatized it to the verb "thread".

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:54:41 +02:00
ines 4a3bea00c7 Update resources [ci skip] 2018-04-26 22:10:34 +02:00
ines 686225eadd Fix Spanish noun_chunks (resolves #2210)
Make sure 'NP' label is added to StringStore and move noun_bounds helper into a closure to allow reusing label sets
2018-04-18 18:44:01 -04:00
ines 9632595fb4 Use correct, non-deprecated merge syntax (resolves #2226) 2018-04-18 18:28:28 -04:00
Suraj Rajan 5957f15227 Fixed typos for #2222,#2223 (#2233) (closes #2222, closes #2223) 2018-04-18 14:55:26 -07:00
Pradeep Kumar Tippa df389e5b74 spacy-101 vocab doc giving valid variable names (#2236) 2018-04-18 14:54:26 -07:00
Ines Montani b4d35c7dfb
Update README.rst 2018-04-10 23:05:49 +02:00
Matthew Honnibal 97851d2c4e Increment version to v2.0.12.dev0 2018-04-10 22:20:16 +02:00
Matthew Honnibal ed39c75a92 Merge branch 'master' of https://github.com/explosion/spaCy 2018-04-10 22:19:40 +02:00
Matthew Honnibal 3836199a83 Fix loading of models when custom vectors are added 2018-04-10 22:19:20 +02:00
ines 0299d5fac8 Update argument annotations and formatting 2018-04-10 21:45:11 +02:00
ines 49b1e48bf5 Fix syntax error 2018-04-10 21:44:59 +02:00
ines ce63f8997b Update init-model docs 2018-04-10 21:42:54 +02:00
ines 70052e46e9 Fix formatting [ci skip] 2018-04-10 21:42:46 +02:00
Matthew Honnibal 0ddb152be0 Improve error message when reading vectors 2018-04-10 21:26:50 +02:00
Matthew Honnibal db50ac524e Support zipped vector files in init-model 2018-04-10 21:21:00 +02:00
ines 270fcfd925 Fix typo in package command message (closes #2200) 2018-04-10 19:14:31 +02:00
ines 24d8bf348d Revert "Add support for .zip to init_model"
This reverts commit 7ee880a0ad.
2018-04-10 19:08:06 +02:00
Matthew Honnibal 7ee880a0ad Add support for .zip to init_model 2018-04-10 14:30:04 +00:00
ines 5ecb274764 Fix indentation error and set Doc.is_tagged correctly 2018-04-10 16:14:52 +02:00
ines 0e847d7fe5 Fix typo 2018-04-09 14:51:14 +02:00
ines 987ee27af7 Return Doc if noun chunks merger component if Doc is not parsed 2018-04-09 14:51:02 +02:00
Xiaoquan Kong e2f13ec722 bugfix: `Doc.noun_chunks` call `Doc.noun_chunks_iterator` without checking (closes #2194) 2018-04-08 23:44:05 +02:00
Jens Dahl Møllerhøj e5055e3cf6 Add Danish lemmatizer (#2184)
* add danish lemmatizer

* fill contributor agreement
2018-04-07 19:07:28 +02:00
ines f86e79aa85 Update README section on tests (resolves #2191) 2018-04-06 16:32:36 +02:00
ines bccbf538ef Revert "Check if spaCy has compiled correctly and show error message"
This reverts commit 3463ded7cf.
2018-04-06 15:49:44 +02:00
ines fb4eda6616 Merge branch 'master' of https://github.com/explosion/spaCy 2018-04-06 00:38:48 +02:00
Matthew Honnibal 0c7fab4443 Set version to 2.0.11 2018-04-04 11:19:11 +02:00
Matthew Honnibal a350be0601 Fix vector-name loading fix 2018-04-04 01:31:25 +02:00
Matthew Honnibal 21047bde52 Fix syntax error in italian lemmatizer 2018-04-03 23:13:22 +02:00
Matthew Honnibal 81f4005f3d Fix loading models with pretrained vectors 2018-04-03 23:11:48 +02:00
ines 3463ded7cf Check if spaCy has compiled correctly and show error message 2018-04-03 22:18:47 +02:00
Matthew Honnibal 96b612873b Add hyper-parameter to control whether parser makes a beam update 2018-04-03 22:02:56 +02:00
ines e5f47cd82d Update errors 2018-04-03 21:40:29 +02:00
Matthew Honnibal f7e6313b43 Increment version to v2.0.11.dev0 2018-04-03 20:58:47 +02:00
ines 10462816bc Fix tests for Python 2 2018-04-03 18:51:31 +02:00
ines 62b4b527d7 Don't raise error if set_extension has getter and setter (closes #2177)
Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests.
2018-04-03 18:30:17 +02:00
ines ee3082ad29 Fix whitespace 2018-04-03 18:29:53 +02:00
ines de137fba84 Add TensorBoard examples to examples overview [ci skip] 2018-04-03 16:01:52 +02:00
ines 6d87b28f15 Add Vietnamese to language overview [ci skip] 2018-04-03 16:01:36 +02:00
Ines Montani 3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal abf8b16d71
Add doc.retokenize() context manager (#2172)
This patch takes a step towards #1487 by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.

The idea is to do merging and splitting like this:

with doc.retokenize() as retokenizer:
    for start, end, label in matches:
        retokenizer.merge(doc[start : end], attrs={'ent_type': label})

The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.

A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.

The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.

We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager.
2018-04-03 14:10:35 +02:00
ines 638068ec6c Restore contributor agreement 2018-03-31 14:06:37 +02:00