From 1a2542fde2bf6feea93817fef8a9c6062857c15b Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Tippa Date: Wed, 7 Feb 2018 15:41:24 +0530 Subject: [PATCH 01/24] Merge from Patch 1 (#1) * Fixing vocab doc Replacing "like" with "love", coffee suffix should be "fee" but not "ffe" * Added pktippa contributor agreement --- .github/contributors/pktippa.md | 106 +++++++++++++++++++++++++++ website/usage/_spacy-101/_vocab.jade | 4 +- 2 files changed, 108 insertions(+), 2 deletions(-) create mode 100644 .github/contributors/pktippa.md diff --git a/.github/contributors/pktippa.md b/.github/contributors/pktippa.md new file mode 100644 index 000000000..740944a72 --- /dev/null +++ b/.github/contributors/pktippa.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [x] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Pradeep Kumar Tippa | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 07-02-2018 | +| GitHub username | pktippa | +| Website (optional) | | diff --git a/website/usage/_spacy-101/_vocab.jade b/website/usage/_spacy-101/_vocab.jade index 185e634fe..b81e0f33c 100644 --- a/website/usage/_spacy-101/_vocab.jade +++ b/website/usage/_spacy-101/_vocab.jade @@ -32,7 +32,7 @@ p | string to get its hash, or a hash to get its string: +code. - doc = nlp(u'I like coffee') + doc = nlp(u'I love coffee') assert doc.vocab.strings[u'coffee'] == 3197928453018144401 assert doc.vocab.strings[3197928453018144401] == u'coffee' @@ -70,7 +70,7 @@ p - var style = [0, 1, 1, 0, 0, 1, 1] +annotation-row(["I", "4690420944186131903", "X", "I", "I", true, false], style) +annotation-row(["love", "3702023516439754181", "xxxx", "l", "ove", true, false], style) - +annotation-row(["coffee", "3197928453018144401", "xxxx", "c", "ffe", true, false], style) + +annotation-row(["coffee", "3197928453018144401", "xxxx", "c", "fee", true, false], style) p | The mapping of words to hashes doesn't depend on any state. To make sure From 03113d67799efd2999edf85ac98f7261ae7ecb48 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Tippa Date: Thu, 8 Feb 2018 19:34:15 +0530 Subject: [PATCH 02/24] Fixing navigating parse tree doc under dependency parse --- website/usage/_linguistic-features/_dependency-parse.jade | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/usage/_linguistic-features/_dependency-parse.jade b/website/usage/_linguistic-features/_dependency-parse.jade index 188b7b8f3..d8d7cbce1 100644 --- a/website/usage/_linguistic-features/_dependency-parse.jade +++ b/website/usage/_linguistic-features/_dependency-parse.jade @@ -65,9 +65,9 @@ p - var style = [0, 1, 0, 1, 0] +annotation-row(["Autonomous", "amod", "cars", "NOUN", ""], style) +annotation-row(["cars", "nsubj", "shift", "VERB", "Autonomous"], style) - +annotation-row(["shift", "ROOT", "shift", "VERB", "cars, liability"], style) + +annotation-row(["shift", "ROOT", "shift", "VERB", "cars, liability, toward"], style) +annotation-row(["insurance", "compound", "liability", "NOUN", ""], style) - +annotation-row(["liability", "dobj", "shift", "VERB", "insurance, toward"], style) + +annotation-row(["liability", "dobj", "shift", "VERB", "insurance"], style) +annotation-row(["toward", "prep", "liability", "NOUN", "manufacturers"], style) +annotation-row(["manufacturers", "pobj", "toward", "ADP", ""], style) From 24af6375db3b312539af4cb06620e23e06e5aa81 Mon Sep 17 00:00:00 2001 From: Orion Montoya Date: Thu, 8 Feb 2018 10:49:09 -0800 Subject: [PATCH 03/24] update link to Honnibal and Johnson 2015 aclweb.org is throwing a gateway timeout on the link as `https`+`aclweb.org`, but is fine with `https`+`www.aclweb.org` (also with `http`+`aclweb.org`, but let's keep it in `https`, shall we? --- website/usage/_facts-figures/_benchmarks.jade | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/usage/_facts-figures/_benchmarks.jade b/website/usage/_facts-figures/_benchmarks.jade index b530b84de..dabf58795 100644 --- a/website/usage/_facts-figures/_benchmarks.jade +++ b/website/usage/_facts-figures/_benchmarks.jade @@ -13,7 +13,7 @@ p | Their results and subsequent discussions helped us develop a novel | psychologically-motivated technique to improve spaCy's accuracy, which | we published in joint work with Macquarie University - | #[+a("https://aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)]. + | #[+a("https://www.aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)]. include _benchmarks-choi-2015 From fc4ae04c55e8a03a1a32e47b05dde8737b7a66fd Mon Sep 17 00:00:00 2001 From: ines Date: Fri, 9 Feb 2018 10:23:03 +0100 Subject: [PATCH 04/24] Document LENGTH attribute in matcher --- website/usage/_linguistic-features/_rule-based-matching.jade | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/website/usage/_linguistic-features/_rule-based-matching.jade b/website/usage/_linguistic-features/_rule-based-matching.jade index 7872b668f..bb75e7ad2 100644 --- a/website/usage/_linguistic-features/_rule-based-matching.jade +++ b/website/usage/_linguistic-features/_rule-based-matching.jade @@ -91,6 +91,10 @@ p +cell.u-nowrap #[code LOWER] +cell The lowercase form of the token text. + +row + +cell #[code LENGTH] + +cell The length of the token text. + +row +cell.u-nowrap #[code IS_ALPHA], #[code IS_ASCII], #[code IS_DIGIT] +cell From e9f67be04d39366697845e3670e92c9e5430380e Mon Sep 17 00:00:00 2001 From: ines Date: Fri, 9 Feb 2018 10:23:33 +0100 Subject: [PATCH 05/24] Fix regex flag matcher example (resolves #1950) --- website/usage/_linguistic-features/_rule-based-matching.jade | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/website/usage/_linguistic-features/_rule-based-matching.jade b/website/usage/_linguistic-features/_rule-based-matching.jade index bb75e7ad2..39062493a 100644 --- a/website/usage/_linguistic-features/_rule-based-matching.jade +++ b/website/usage/_linguistic-features/_rule-based-matching.jade @@ -339,7 +339,8 @@ p | flag. +code. - IS_DEFINITELY = nlp.vocab.add_flag(re.compile(r'deff?in[ia]tely').match) + definitely_flag = lambda text: bool(re.compile(r'deff?in[ia]tely').match(text)) + IS_DEFINITELY = nlp.vocab.add_flag(definitely_flag) matcher = Matcher(nlp.vocab) matcher.add('DEFINITELY', None, [{IS_DEFINITELY: True}]) From ab33e274f5c71c73a24e8580f0adcb32365e5ffe Mon Sep 17 00:00:00 2001 From: ines Date: Fri, 9 Feb 2018 10:43:33 +0100 Subject: [PATCH 06/24] Add more details on symlink error & Windows solution (resolves #1941) [ci skip] --- website/usage/_install/_troubleshooting.jade | 7 ++++--- website/usage/_models/_install-basics.jade | 10 ++++++++++ website/usage/_models/_install.jade | 2 +- 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/website/usage/_install/_troubleshooting.jade b/website/usage/_install/_troubleshooting.jade index c846ff957..2135f323a 100644 --- a/website/usage/_install/_troubleshooting.jade +++ b/website/usage/_install/_troubleshooting.jade @@ -38,9 +38,10 @@ p | #[code spacy/data] directory. This means your user needs permission to do | this. The above error mostly occurs when doing a system-wide installation, | which will create the symlinks in a system directory. Run the - | #[code download] or #[code link] command as administrator, or use a - | #[code virtualenv] to install spaCy in a user directory, instead - | of doing a system-wide installation. + | #[code download] or #[code link] command as administrator (on Windows, + | simply right-click on your terminal or shell ans select "Run as + | Administrator"), or use a #[code virtualenv] to install spaCy in a user + | directory, instead of doing a system-wide installation. +h(3, "no-cache-dir") No such option: --no-cache-dir diff --git a/website/usage/_models/_install-basics.jade b/website/usage/_models/_install-basics.jade index 7b32e3333..3fb8fa00c 100644 --- a/website/usage/_models/_install-basics.jade +++ b/website/usage/_models/_install-basics.jade @@ -31,3 +31,13 @@ p import spacy nlp = spacy.load('en') doc = nlp(u'This is a sentence.') + ++infobox("Important note", "⚠️") + | To allow loading models via convenient shortcuts like #[code 'en'], spaCy + | will create a symlink within the #[code spacy/data] directory. This means + | that your user needs the #[strong required permissions]. + | If you've installed spaCy to a system directory and don't have admin + | privileges, the model linking may fail. The easiest solution + | is to re-run the command as admin, or use a #[code virtualenv]. For more + | info on this, see the + | #[+a("/usage/#symlink-privilege") troubleshooting guide]. diff --git a/website/usage/_models/_install.jade b/website/usage/_models/_install.jade index 769d3f2d6..7473e41a6 100644 --- a/website/usage/_models/_install.jade +++ b/website/usage/_models/_install.jade @@ -132,7 +132,7 @@ p # set up shortcut link to load local model as "my_amazing_model" python -m spacy link /Users/you/model my_amazing_model -+infobox("Important note") ++infobox("Important note", "⚠️") | In order to create a symlink, your user needs the #[strong required permissions]. | If you've installed spaCy to a system directory and don't have admin | privileges, the #[code spacy link] command may fail. The easiest solution From 01cc9cd9c03c050756679a4f457b98f0dc6c30bc Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Tippa Date: Fri, 9 Feb 2018 19:16:25 +0530 Subject: [PATCH 07/24] assert statement syntax fix in doc --- website/usage/_linguistic-features/_named-entities.jade | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/usage/_linguistic-features/_named-entities.jade b/website/usage/_linguistic-features/_named-entities.jade index 9e55ba84e..0f32d1da3 100644 --- a/website/usage/_linguistic-features/_named-entities.jade +++ b/website/usage/_linguistic-features/_named-entities.jade @@ -80,7 +80,7 @@ p doc.ents = [netflix_ent] ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents] - assert ents = [(u'Netflix', 0, 7, u'ORG')] + assert ents == [(u'Netflix', 0, 7, u'ORG')] p | Keep in mind that you need to create a #[code Span] with the start and From 416cd021cee39a9a61045c6438518f6eb137723a Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Tippa Date: Fri, 9 Feb 2018 19:16:59 +0530 Subject: [PATCH 08/24] Added TAG from spacy symbols which used below --- website/usage/_linguistic-features/_tokenization.jade | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/usage/_linguistic-features/_tokenization.jade b/website/usage/_linguistic-features/_tokenization.jade index f149556ce..2cd3a13de 100644 --- a/website/usage/_linguistic-features/_tokenization.jade +++ b/website/usage/_linguistic-features/_tokenization.jade @@ -54,7 +54,7 @@ p +code. import spacy - from spacy.symbols import ORTH, LEMMA, POS + from spacy.symbols import ORTH, LEMMA, POS, TAG nlp = spacy.load('en') doc = nlp(u'gimme that') # phrase to tokenize From 6ee5dff51c8a54c7dd9f345f10f33542888034eb Mon Sep 17 00:00:00 2001 From: Lyndon White Date: Fri, 9 Feb 2018 23:03:35 +0800 Subject: [PATCH 09/24] Make python 3.4 compat module loading (fix #1733) --- spacy/compat.py | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/spacy/compat.py b/spacy/compat.py index e50036013..3cc214b28 100644 --- a/spacy/compat.py +++ b/spacy/compat.py @@ -43,15 +43,15 @@ fix_text = ftfy.fix_text copy_array = copy_array izip = getattr(itertools, 'izip', zip) -is_python2 = six.PY2 -is_python3 = six.PY3 is_windows = sys.platform.startswith('win') is_linux = sys.platform.startswith('linux') is_osx = sys.platform == 'darwin' +is_python2 = six.PY2 +is_python3 = six.PY3 +is_python_pre_3_5 = is_python2 or (is_python3 and sys.version_info[1]<5) if is_python2: - import imp bytes_ = str unicode_ = unicode # noqa: F821 basestring_ = basestring # noqa: F821 @@ -60,7 +60,6 @@ if is_python2: path2str = lambda path: str(path).decode('utf8') elif is_python3: - import importlib.util bytes_ = bytes unicode_ = str basestring_ = str @@ -111,9 +110,11 @@ def normalize_string_keys(old): def import_file(name, loc): loc = str(loc) - if is_python2: + if is_python_pre_3_5: + import imp return imp.load_source(name, loc) else: + import importlib.util spec = importlib.util.spec_from_file_location(name, str(loc)) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) From 5b1bc8d10132de67b8b27d4bbdd24b3c72a479f6 Mon Sep 17 00:00:00 2001 From: Lyndon White Date: Fri, 9 Feb 2018 23:14:29 +0800 Subject: [PATCH 10/24] Sign contributors agreement --- .github/contributors/oxinabox.md | 106 +++++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 .github/contributors/oxinabox.md diff --git a/.github/contributors/oxinabox.md b/.github/contributors/oxinabox.md new file mode 100644 index 000000000..2c2f723df --- /dev/null +++ b/.github/contributors/oxinabox.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [x] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Lyndon White | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 9/2/2018 | +| GitHub username | oxinabox | +| Website (optional) | white.ucc.asn.au | From 94ce43adf0e779a777e4feaf9b289796005cf461 Mon Sep 17 00:00:00 2001 From: Lyndon White Date: Fri, 9 Feb 2018 23:19:11 +0800 Subject: [PATCH 11/24] squashme --- .github/contributors/oxinabox.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/contributors/oxinabox.md b/.github/contributors/oxinabox.md index 2c2f723df..8e58c4ea1 100644 --- a/.github/contributors/oxinabox.md +++ b/.github/contributors/oxinabox.md @@ -91,7 +91,7 @@ mark both statements: or entity, including my employer, has or will have rights with respect to my contributions. - * [x] I am signing on behalf of my employer or a legal entity and I have the + * [ ] I am signing on behalf of my employer or a legal entity and I have the actual authority to contractually bind that entity. ## Contributor Details From c63e99da8a15d8540bae20cabfecec9f3d8c30ae Mon Sep 17 00:00:00 2001 From: ines Date: Sat, 10 Feb 2018 11:58:41 +0100 Subject: [PATCH 12/24] Fix typo in glossary (resolves #1964) Co-Authored-By: SThomasP --- spacy/glossary.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spacy/glossary.py b/spacy/glossary.py index c17cb7467..02d8815e0 100644 --- a/spacy/glossary.py +++ b/spacy/glossary.py @@ -115,7 +115,7 @@ GLOSSARY = { 'ADJA': 'adjective, attributive', 'ADJD': 'adjective, adverbial or predicative', 'APPO': 'postposition', - 'APRP': 'preposition; circumposition left', + 'APPR': 'preposition; circumposition left', 'APPRART': 'preposition with article', 'APZR': 'circumposition right', 'ART': 'definite or indefinite article', From 471d3c9e233240947b2cdb3b79969bcf86abda67 Mon Sep 17 00:00:00 2001 From: 4altinok Date: Sun, 11 Feb 2018 18:50:50 +0100 Subject: [PATCH 13/24] added lex test for is_currency --- spacy/tests/lang/test_attrs.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/spacy/tests/lang/test_attrs.py b/spacy/tests/lang/test_attrs.py index 92ee04737..67485ee60 100644 --- a/spacy/tests/lang/test_attrs.py +++ b/spacy/tests/lang/test_attrs.py @@ -2,7 +2,7 @@ from __future__ import unicode_literals from ...attrs import intify_attrs, ORTH, NORM, LEMMA, IS_ALPHA -from ...lang.lex_attrs import is_punct, is_ascii, like_url, word_shape +from ...lang.lex_attrs import is_punct, is_ascii, is_currency, like_url, word_shape import pytest @@ -37,6 +37,13 @@ def test_lex_attrs_is_ascii(text, match): assert is_ascii(text) == match +@pytest.mark.parametrize('text,match', [('$', True), ('£', True), ('♥', False), + ('€', True), ('¥', True), ('¢', True), + ('a', False), ('www.google.com', False), ('dog', False)]) +def test_lex_attrs_is_currency(text, match): + assert is_currency(text) == match + + @pytest.mark.parametrize('text,match', [ ('www.google.com', True), ('google.com', True), ('sydney.com', True), ('2girls1cup.org', True), ('http://stupid', True), ('www.hi', True), From 3deef1497ae5394dea2835d714190d0855744551 Mon Sep 17 00:00:00 2001 From: 4altinok Date: Sun, 11 Feb 2018 18:51:09 +0100 Subject: [PATCH 14/24] removed 18 and replaced 18 with is_currency --- spacy/attrs.pxd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spacy/attrs.pxd b/spacy/attrs.pxd index 74397fa64..79a177ba9 100644 --- a/spacy/attrs.pxd +++ b/spacy/attrs.pxd @@ -18,9 +18,9 @@ cdef enum attr_id_t: IS_QUOTE IS_LEFT_PUNCT IS_RIGHT_PUNCT + IS_CURRENCY - FLAG18 = 18 - FLAG19 + FLAG19 = 19 FLAG20 FLAG21 FLAG22 From 94fb0b75e327bfeeac9fc5547244d20066ecaf16 Mon Sep 17 00:00:00 2001 From: 4altinok Date: Sun, 11 Feb 2018 18:51:32 +0100 Subject: [PATCH 15/24] code for is_currency --- spacy/attrs.pyx | 2 +- spacy/lang/lex_attrs.py | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/spacy/attrs.pyx b/spacy/attrs.pyx index 893ec0845..d4e8a38c5 100644 --- a/spacy/attrs.pyx +++ b/spacy/attrs.pyx @@ -21,7 +21,7 @@ IDS = { "IS_QUOTE": IS_QUOTE, "IS_LEFT_PUNCT": IS_LEFT_PUNCT, "IS_RIGHT_PUNCT": IS_RIGHT_PUNCT, - "FLAG18": FLAG18, + "IS_CURRENCY": IS_CURRENCY, "FLAG19": FLAG19, "FLAG20": FLAG20, "FLAG21": FLAG21, diff --git a/spacy/lang/lex_attrs.py b/spacy/lang/lex_attrs.py index c3bb4a8ff..f1279f035 100644 --- a/spacy/lang/lex_attrs.py +++ b/spacy/lang/lex_attrs.py @@ -69,6 +69,14 @@ def is_right_punct(text): return text in right_punct +def is_currency(text): + # can be overwritten by lang with list of currency words, e.g. dollar, euro + for char in text: + if unicodedata.category(char) != 'Sc': + return False + return True + + def like_email(text): return bool(_like_email(text)) @@ -164,5 +172,6 @@ LEX_ATTRS = { attrs.IS_QUOTE: is_quote, attrs.IS_LEFT_PUNCT: is_left_punct, attrs.IS_RIGHT_PUNCT: is_right_punct, + attrs.IS_CURRENCY: is_currency, attrs.LIKE_URL: like_url } From ed1ac2969eb585d3c1b5a63f23ae79f8c57db878 Mon Sep 17 00:00:00 2001 From: 4altinok Date: Sun, 11 Feb 2018 18:51:48 +0100 Subject: [PATCH 16/24] added new lexical feat to lexeme --- spacy/lexeme.pyx | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/spacy/lexeme.pyx b/spacy/lexeme.pyx index d136540f9..78d3bed6c 100644 --- a/spacy/lexeme.pyx +++ b/spacy/lexeme.pyx @@ -12,7 +12,7 @@ import numpy from .typedefs cimport attr_t, flags_t from .attrs cimport IS_ALPHA, IS_ASCII, IS_DIGIT, IS_LOWER, IS_PUNCT, IS_SPACE from .attrs cimport IS_TITLE, IS_UPPER, LIKE_URL, LIKE_NUM, LIKE_EMAIL, IS_STOP -from .attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT, IS_OOV +from .attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT, IS_CURRENCY, IS_OOV from .attrs cimport PROB from .attrs import intify_attrs from . import about @@ -474,6 +474,14 @@ cdef class Lexeme: def __set__(self, bint x): Lexeme.c_set_flag(self.c, IS_RIGHT_PUNCT, x) + property is_currency: + """RETURNS (bool): Whether the lexeme is a currency symbol, e.g. $, €.""" + def __get__(self): + return Lexeme.c_check_flag(self.c, IS_CURRENCY) + + def __set__(self, bint x): + Lexeme.c_set_flag(self.c, IS_CURRENCY, x) + property like_url: """RETURNS (bool): Whether the lexeme resembles a URL.""" def __get__(self): From edd7202a064f05b63eee40c2b9aea706a0164423 Mon Sep 17 00:00:00 2001 From: 4altinok Date: Sun, 11 Feb 2018 18:55:32 +0100 Subject: [PATCH 17/24] added new symbol --- spacy/symbols.pxd | 4 ++-- spacy/symbols.pyx | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/spacy/symbols.pxd b/spacy/symbols.pxd index 6960681a3..cc1734e6d 100644 --- a/spacy/symbols.pxd +++ b/spacy/symbols.pxd @@ -17,9 +17,9 @@ cdef enum symbol_t: IS_QUOTE IS_LEFT_PUNCT IS_RIGHT_PUNCT + IS_CURRENCY - FLAG18 = 18 - FLAG19 + FLAG19 = 19 FLAG20 FLAG21 FLAG22 diff --git a/spacy/symbols.pyx b/spacy/symbols.pyx index 98e4c440d..4bc1d4228 100644 --- a/spacy/symbols.pyx +++ b/spacy/symbols.pyx @@ -22,8 +22,8 @@ IDS = { "IS_QUOTE": IS_QUOTE, "IS_LEFT_PUNCT": IS_LEFT_PUNCT, "IS_RIGHT_PUNCT": IS_RIGHT_PUNCT, + "IS_CURRENCY": IS_CURRENCY, - "FLAG18": FLAG18, "FLAG19": FLAG19, "FLAG20": FLAG20, "FLAG21": FLAG21, From ca8728035dc1139988119e216b18b4b131a3d767 Mon Sep 17 00:00:00 2001 From: 4altinok Date: Sun, 11 Feb 2018 18:55:48 +0100 Subject: [PATCH 18/24] added new lex feat to token --- spacy/tokens/token.pyx | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/spacy/tokens/token.pyx b/spacy/tokens/token.pyx index 74487b515..9e4b878cf 100644 --- a/spacy/tokens/token.pyx +++ b/spacy/tokens/token.pyx @@ -15,7 +15,7 @@ from ..lexeme cimport Lexeme from .. import parts_of_speech from ..attrs cimport IS_ALPHA, IS_ASCII, IS_DIGIT, IS_LOWER, IS_PUNCT, IS_SPACE from ..attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT -from ..attrs cimport IS_OOV, IS_TITLE, IS_UPPER, LIKE_URL, LIKE_NUM, LIKE_EMAIL +from ..attrs cimport IS_OOV, IS_TITLE, IS_UPPER, IS_CURRENCY, LIKE_URL, LIKE_NUM, LIKE_EMAIL from ..attrs cimport IS_STOP, ID, ORTH, NORM, LOWER, SHAPE, PREFIX, SUFFIX from ..attrs cimport LENGTH, CLUSTER, LEMMA, POS, TAG, DEP from ..compat import is_config @@ -855,6 +855,11 @@ cdef class Token: def __get__(self): return Lexeme.c_check_flag(self.c.lex, IS_RIGHT_PUNCT) + property is_currency: + """RETURNS (bool): Whether the token is a currency symbol.""" + def __get__(self): + return Lexeme.c_check_flag(self.c.lex, IS_CURRENCY) + property like_url: """RETURNS (bool): Whether the token resembles a URL.""" def __get__(self): From bf94c13382f23ac2d320483df17786f2866d4b3b Mon Sep 17 00:00:00 2001 From: Johannes Dollinger Date: Tue, 13 Feb 2018 12:42:23 +0100 Subject: [PATCH 19/24] Don't fix random seeds on import --- spacy/cli/evaluate.py | 8 ++------ spacy/cli/train.py | 6 +----- spacy/util.py | 6 ++++++ 3 files changed, 9 insertions(+), 11 deletions(-) diff --git a/spacy/cli/evaluate.py b/spacy/cli/evaluate.py index 551689413..43edd858d 100644 --- a/spacy/cli/evaluate.py +++ b/spacy/cli/evaluate.py @@ -3,8 +3,6 @@ from __future__ import unicode_literals, division, print_function import plac from timeit import default_timer as timer -import random -import numpy.random from ..gold import GoldCorpus from ..util import prints @@ -12,10 +10,6 @@ from .. import util from .. import displacy -random.seed(0) -numpy.random.seed(0) - - @plac.annotations( model=("model name or path", "positional", None, str), data_path=("location of JSON-formatted evaluation data", "positional", @@ -31,6 +25,8 @@ def evaluate(model, data_path, gpu_id=-1, gold_preproc=False, displacy_path=None Evaluate a model. To render a sample of parses in a HTML file, set an output directory as the displacy_path argument. """ + + util.fix_random_seed() if gpu_id >= 0: util.use_gpu(gpu_id) util.set_env_log(False) diff --git a/spacy/cli/train.py b/spacy/cli/train.py index f8363bde1..6c7b95682 100644 --- a/spacy/cli/train.py +++ b/spacy/cli/train.py @@ -6,8 +6,6 @@ from pathlib import Path import tqdm from thinc.neural._classes.model import Model from timeit import default_timer as timer -import random -import numpy.random from ..gold import GoldCorpus, minibatch from ..util import prints @@ -16,9 +14,6 @@ from .. import about from .. import displacy from ..compat import json_dumps -random.seed(0) -numpy.random.seed(0) - @plac.annotations( lang=("model language", "positional", None, str), @@ -45,6 +40,7 @@ def train(lang, output_dir, train_data, dev_data, n_iter=30, n_sents=0, """ Train a model. Expects data in spaCy's JSON format. """ + util.fix_random_seed() util.set_env_log(True) n_sents = n_sents or None output_path = util.ensure_path(output_dir) diff --git a/spacy/util.py b/spacy/util.py index 7676b33b2..b42ca3734 100644 --- a/spacy/util.py +++ b/spacy/util.py @@ -17,6 +17,7 @@ from thinc.neural._classes.model import Model import functools import cytoolz import itertools +import numpy as np from .symbols import ORTH from .compat import cupy, CudaStream, path2str, basestring_, input_, unicode_ @@ -623,3 +624,8 @@ def use_gpu(gpu_id): Model.ops = CupyOps() Model.Ops = CupyOps return device + + +def fix_random_seed(seed=0): + random.seed(0) + np.random.seed(0) From 012e874d094bbf13ed18547eb3d13e197399e955 Mon Sep 17 00:00:00 2001 From: Johannes Dollinger Date: Tue, 13 Feb 2018 12:52:48 +0100 Subject: [PATCH 20/24] Add contributor agreement for emulbreh --- .github/contributors/emulbreh.md | 106 +++++++++++++++++++++++++++++++ spacy/util.py | 6 +- 2 files changed, 109 insertions(+), 3 deletions(-) create mode 100644 .github/contributors/emulbreh.md diff --git a/.github/contributors/emulbreh.md b/.github/contributors/emulbreh.md new file mode 100644 index 000000000..60388d22a --- /dev/null +++ b/.github/contributors/emulbreh.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Johannes Dollinger | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2018-02-13 | +| GitHub username | emulbreh | +| Website (optional) | | diff --git a/spacy/util.py b/spacy/util.py index b42ca3734..dc51e467d 100644 --- a/spacy/util.py +++ b/spacy/util.py @@ -17,7 +17,7 @@ from thinc.neural._classes.model import Model import functools import cytoolz import itertools -import numpy as np +import numpy.random from .symbols import ORTH from .compat import cupy, CudaStream, path2str, basestring_, input_, unicode_ @@ -627,5 +627,5 @@ def use_gpu(gpu_id): def fix_random_seed(seed=0): - random.seed(0) - np.random.seed(0) + random.seed(seed) + numpy.random.seed(seed) From cdd4b3d05c511372d888b04c70d0a985bb307b17 Mon Sep 17 00:00:00 2001 From: Claudiu-Vlad Ursache Date: Sun, 4 Feb 2018 17:48:11 +0100 Subject: [PATCH 21/24] Add contributor agreement for @ursachec --- .github/contributors/ursachec.md | 106 +++++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 .github/contributors/ursachec.md diff --git a/.github/contributors/ursachec.md b/.github/contributors/ursachec.md new file mode 100644 index 000000000..45a85f166 --- /dev/null +++ b/.github/contributors/ursachec.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | ------------------------- | +| Name | Claudiu-Vlad Ursache | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2018-02-04 | +| GitHub username | ursachec | +| Website (optional) | https://www.cvursache.com | From e28de12cbdeb932de0ec4fff901a3bc469bb90c3 Mon Sep 17 00:00:00 2001 From: Claudiu-Vlad Ursache Date: Tue, 13 Feb 2018 20:44:33 +0100 Subject: [PATCH 22/24] Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706). --- spacy/language.py | 2 +- spacy/pipeline.pyx | 6 ++-- spacy/syntax/nn_parser.pyx | 2 +- .../serialize/test_serialize_language.py | 28 +++++++++++++++++++ spacy/vectors.pyx | 3 +- 5 files changed, 36 insertions(+), 5 deletions(-) create mode 100644 spacy/tests/serialize/test_serialize_language.py diff --git a/spacy/language.py b/spacy/language.py index a2b945c49..a61b6b09f 100644 --- a/spacy/language.py +++ b/spacy/language.py @@ -624,7 +624,7 @@ class Language(object): deserializers = OrderedDict(( ('vocab', lambda p: self.vocab.from_disk(p)), ('tokenizer', lambda p: self.tokenizer.from_disk(p, vocab=False)), - ('meta.json', lambda p: self.meta.update(ujson.load(p.open('r')))) + ('meta.json', lambda p: self.meta.update(util.read_json(p))) )) for name, proc in self.pipeline: if name in disable: diff --git a/spacy/pipeline.pyx b/spacy/pipeline.pyx index c5f8065de..dae21941e 100644 --- a/spacy/pipeline.pyx +++ b/spacy/pipeline.pyx @@ -214,7 +214,8 @@ class Pipe(object): def _load_cfg(path): if path.exists(): - return ujson.load(path.open()) + with path.open() as file_: + return ujson.load(file_) else: return {} @@ -580,7 +581,8 @@ class Tagger(Pipe): def load_model(p): if self.model is True: self.model = self.Model(self.vocab.morphology.n_tags, **self.cfg) - self.model.from_bytes(p.open('rb').read()) + with p.open('rb') as file_: + self.model.from_bytes(file_.read()) def load_tag_map(p): with p.open('rb') as file_: diff --git a/spacy/syntax/nn_parser.pyx b/spacy/syntax/nn_parser.pyx index fa91c697e..f192e3b96 100644 --- a/spacy/syntax/nn_parser.pyx +++ b/spacy/syntax/nn_parser.pyx @@ -887,7 +887,7 @@ cdef class Parser: deserializers = { 'vocab': lambda p: self.vocab.from_disk(p), 'moves': lambda p: self.moves.from_disk(p, strings=False), - 'cfg': lambda p: self.cfg.update(ujson.load(p.open())), + 'cfg': lambda p: self.cfg.update(util.read_json(p)), 'model': lambda p: None } util.from_disk(path, deserializers, exclude) diff --git a/spacy/tests/serialize/test_serialize_language.py b/spacy/tests/serialize/test_serialize_language.py new file mode 100644 index 000000000..1fcf8ef18 --- /dev/null +++ b/spacy/tests/serialize/test_serialize_language.py @@ -0,0 +1,28 @@ +# coding: utf-8 +from __future__ import unicode_literals + +from ..util import make_tempdir +from ...language import Language + +import pytest + + +@pytest.fixture +def meta_data(): + return { + 'name': 'name-in-fixture', + 'version': 'version-in-fixture', + 'description': 'description-in-fixture', + 'author': 'author-in-fixture', + 'email': 'email-in-fixture', + 'url': 'url-in-fixture', + 'license': 'license-in-fixture', + } + + +def test_serialize_language_meta_disk(meta_data): + language = Language(meta=meta_data) + with make_tempdir() as d: + language.to_disk(d) + new_language = Language().from_disk(d) + assert new_language.meta == language.meta diff --git a/spacy/vectors.pyx b/spacy/vectors.pyx index 079f6fc84..7daebabe6 100644 --- a/spacy/vectors.pyx +++ b/spacy/vectors.pyx @@ -347,7 +347,8 @@ cdef class Vectors: """ def load_key2row(path): if path.exists(): - self.key2row = msgpack.load(path.open('rb')) + with path.open('rb') as file_: + self.key2row = msgpack.load(file_) for key, row in self.key2row.items(): if row in self._unset: self._unset.remove(row) From cab5b775e799c2f61058400c559f9cf6d89b39ff Mon Sep 17 00:00:00 2001 From: ines Date: Thu, 15 Feb 2018 12:14:19 +0100 Subject: [PATCH 23/24] Document ENT_TYPE matcher attribute [ci skip] --- website/usage/_linguistic-features/_rule-based-matching.jade | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/website/usage/_linguistic-features/_rule-based-matching.jade b/website/usage/_linguistic-features/_rule-based-matching.jade index 39062493a..4b13bd581 100644 --- a/website/usage/_linguistic-features/_rule-based-matching.jade +++ b/website/usage/_linguistic-features/_rule-based-matching.jade @@ -121,6 +121,10 @@ p | The token's simple and extended part-of-speech tag, dependency | label, lemma, shape. + +row + +cell.u-nowrap #[code ENT_TYPE] + +cell The token's entity label. + +h(4, "adding-patterns-wildcard") Using wildcard token patterns +tag-new(2) From ca56fb53d17106b31190cecc1480c03894049d80 Mon Sep 17 00:00:00 2001 From: ines Date: Thu, 15 Feb 2018 12:14:30 +0100 Subject: [PATCH 24/24] Add user survey to navigation [ci skip] --- website/_includes/_navigation.jade | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/_includes/_navigation.jade b/website/_includes/_navigation.jade index e5837747f..8ce5e394b 100644 --- a/website/_includes/_navigation.jade +++ b/website/_includes/_navigation.jade @@ -10,6 +10,9 @@ nav.c-nav.u-text.js-nav(class=landing ? "c-nav--theme" : null) li.c-nav__menu__item(class=is_active ? "is-active" : null) +a(url)(tabindex=is_active ? "-1" : null)=item + li.c-nav__menu__item.u-hidden-xs + +a("https://survey.spacy.io", true) User Survey 2018 + li.c-nav__menu__item.u-hidden-xs +a(gh("spaCy"))(aria-label="GitHub") #[+icon("github", 20)]