Commit Graph

8657 Commits

Author SHA1 Message Date
Matt Upson 9a1d3b63fb Add missing default to .set_extension (#2297)
Failing to set a default, method, or getter results in a ValueError:

ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0
2018-05-04 18:47:01 +02:00
ines 929a01139a Order issue templates 2018-05-04 03:04:41 +02:00
Ines Montani 7f39c8896b
Update issue templates (#2295)
* Update issue templates

* Update templates
2018-05-04 03:02:26 +02:00
Douglas Knox 9b49a40f4e Test and fix for Issue #2219 (#2272)
Test and fix for Issue #2219: Token.similarity() failed if single letter
2018-05-03 18:40:46 +02:00
Paul O'Leary McCann bd72fbf09c Port Japanese mecab tokenizer from v1 (#2036)
* Port Japanese mecab tokenizer from v1

This brings the Mecab-based Japanese tokenization introduced in #1246 to
spaCy v2. There isn't a JapaneseTagger implementation yet, but POS tag
information from Mecab is stored in a token extension. A tag map is also
included.

As a reminder, Mecab is required because Universal Dependencies are
based on Unidic tags, and Janome doesn't support Unidic.

Things to check:

1. Is this the right way to use a token extension?

2. What's the right way to implement a JapaneseTagger? The approach in
 #1246 relied on `tag_from_strings` which is just gone now. I guess the
best thing is to just try training spaCy's default Tagger?

-POLM

* Add tagging/make_doc and tests
2018-05-03 18:38:26 +02:00
G.Pruvost cc8e804648 #2211 - Support for ssl certs config on download command (#2212)
* Add support for SSL/Certs customization on download CLI

* Add a note on SSL options for the 'download' CLI in the README

* Add contributor agreement
2018-05-03 18:37:02 +02:00
Jens Dahl Møllerhøj b9290397fb rename SP to _SP (#2289) 2018-05-03 18:33:49 +02:00
ines c9547b7b8b Update Juniper (see #2293) 2018-05-03 15:36:02 +02:00
Alex Villarreal 647f2544c5 Fix code sample for span.set_extension (#2286) 2018-05-03 00:39:22 +02:00
Alex Villarreal 13d562e1a4 Fix code sample for Doc.set_extension (#2282)
* Fix code sample for `set_extension`

The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.

* Contributor agreement
2018-05-02 10:16:05 +02:00
Mr Roboto 6f5ccda19c Addresses Issue #2228 - Deserialization fails when using tensor=False or sentiment=False (#2230)
* Fixes issue #2228

* Adds a new contributor
2018-05-01 13:40:22 +02:00
Shirish Kadam d98a90440f Added Adam project to spaCy Universe (#2275)
* Added 5hirish to contributors

* Added Adam Qas Project to spaCy Universe

* Remove $ from code example
2018-04-30 22:25:01 +02:00
ines 56e7faf16b Fix spacing 2018-04-30 22:24:40 +02:00
ines 6efb4cdf88 Use Juniper and tidy up 2018-04-30 18:48:35 +02:00
ines 45bb8d75a5 Fix overflow issues on small screens [ci skip] 2018-04-29 03:17:36 +02:00
Ines Montani 49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00
ines 3c80f69ff5 Return data in cli.info and add silent option (resolves #2196) 2018-04-29 01:59:44 +02:00
ines 1c6d77610c Add remove_extension method on Doc, Token and Span (closes #2242) 2018-04-28 23:33:09 +02:00
ines a512fa60ef Remove upcoming option from docs for now 2018-04-28 23:32:18 +02:00
ines abdb853ebf Simplify underscore tests 2018-04-28 23:30:33 +02:00
ines 6fb6371670 Add collapse_phrases option to displacy (closes #2266) 2018-04-28 23:06:50 +02:00
Matt Upson 87cc6b3599 Add missing comma to NN example in docs (#2255)
Also add a completed contributor agreement.
2018-04-28 14:56:00 +02:00
Robin Linderborg 1f9904ef12 fixes #2238 (#2241)
* Remove erroneous lemma lookup år > åra in Swedish

* Add contributors agreement

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:55:22 +02:00
Robin Linderborg d01f503b54 Remove incorrect lemma lookup gäng->gänga (#2252)
* Remove incorrect lemma lookup gäng->gänga
In modern Swedish, "gäng" is mostly associated with "gang" or "group of people". The removed lemma lookup lemmatized it to the verb "thread".

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:54:41 +02:00
ines 4a3bea00c7 Update resources [ci skip] 2018-04-26 22:10:34 +02:00
ines 686225eadd Fix Spanish noun_chunks (resolves #2210)
Make sure 'NP' label is added to StringStore and move noun_bounds helper into a closure to allow reusing label sets
2018-04-18 18:44:01 -04:00
ines 9632595fb4 Use correct, non-deprecated merge syntax (resolves #2226) 2018-04-18 18:28:28 -04:00
Suraj Rajan 5957f15227 Fixed typos for #2222,#2223 (#2233) (closes #2222, closes #2223) 2018-04-18 14:55:26 -07:00
Pradeep Kumar Tippa df389e5b74 spacy-101 vocab doc giving valid variable names (#2236) 2018-04-18 14:54:26 -07:00
Ines Montani b4d35c7dfb
Update README.rst 2018-04-10 23:05:49 +02:00
Matthew Honnibal 97851d2c4e Increment version to v2.0.12.dev0 2018-04-10 22:20:16 +02:00
Matthew Honnibal ed39c75a92 Merge branch 'master' of https://github.com/explosion/spaCy 2018-04-10 22:19:40 +02:00
Matthew Honnibal 3836199a83 Fix loading of models when custom vectors are added 2018-04-10 22:19:20 +02:00
ines 0299d5fac8 Update argument annotations and formatting 2018-04-10 21:45:11 +02:00
ines 49b1e48bf5 Fix syntax error 2018-04-10 21:44:59 +02:00
ines ce63f8997b Update init-model docs 2018-04-10 21:42:54 +02:00
ines 70052e46e9 Fix formatting [ci skip] 2018-04-10 21:42:46 +02:00
Matthew Honnibal 0ddb152be0 Improve error message when reading vectors 2018-04-10 21:26:50 +02:00
Matthew Honnibal db50ac524e Support zipped vector files in init-model 2018-04-10 21:21:00 +02:00
ines 270fcfd925 Fix typo in package command message (closes #2200) 2018-04-10 19:14:31 +02:00
ines 24d8bf348d Revert "Add support for .zip to init_model"
This reverts commit 7ee880a0ad.
2018-04-10 19:08:06 +02:00
Matthew Honnibal 7ee880a0ad Add support for .zip to init_model 2018-04-10 14:30:04 +00:00
ines 5ecb274764 Fix indentation error and set Doc.is_tagged correctly 2018-04-10 16:14:52 +02:00
ines 0e847d7fe5 Fix typo 2018-04-09 14:51:14 +02:00
ines 987ee27af7 Return Doc if noun chunks merger component if Doc is not parsed 2018-04-09 14:51:02 +02:00
Xiaoquan Kong e2f13ec722 bugfix: `Doc.noun_chunks` call `Doc.noun_chunks_iterator` without checking (closes #2194) 2018-04-08 23:44:05 +02:00
Jens Dahl Møllerhøj e5055e3cf6 Add Danish lemmatizer (#2184)
* add danish lemmatizer

* fill contributor agreement
2018-04-07 19:07:28 +02:00
ines f86e79aa85 Update README section on tests (resolves #2191) 2018-04-06 16:32:36 +02:00
ines bccbf538ef Revert "Check if spaCy has compiled correctly and show error message"
This reverts commit 3463ded7cf.
2018-04-06 15:49:44 +02:00
ines fb4eda6616 Merge branch 'master' of https://github.com/explosion/spaCy 2018-04-06 00:38:48 +02:00