Commit Graph

8828 Commits

Author SHA1 Message Date
Ole Henrik Skogstrøm 0473add369 Feature/span ents (#2599)
* Created Span.ents property

* Add tests for span.ents

* Add tests for start and end of sentence
2018-08-07 13:52:32 +02:00
Xiaoquan Kong 87fa847e6e Fix Chinese language related bugs (#2634) 2018-08-07 11:26:31 +02:00
Matthew Honnibal 664cfc29bc Merge branch 'master' of https://github.com/explosion/spaCy 2018-08-07 10:49:39 +02:00
Matthew Honnibal 2278c9734e Fix spelling error #2640 2018-08-07 10:49:21 +02:00
Xiaoquan Kong f0c9652ed1 New Feature: display more detail when Error E067 (#2639)
* Fix off-by-one error

* Add verbose option

* Update verbose option

* Update documents for verbose option
2018-08-07 10:45:29 +02:00
Emil Stenström 1914c488d3 Swedish: Exceptions for single letter words ending sentence (#2615)
* Exceptions for single letter words ending sentence

Sentences ending in "i." (as in "... peka i."), "m." (as in "...än 2000 m."), should be tokenized as two separate tokens.

* Add test
2018-08-05 14:14:30 +02:00
Matthew Honnibal 860f5bd91f Add test for issue 2626 2018-08-05 13:46:57 +02:00
Matthew Honnibal f762d52b24 Add example for Issue #2627 2018-08-05 13:33:52 +02:00
Ines Montani 6a4360e425 Update universe [ci skip] 2018-08-02 17:33:08 +02:00
Sami dbc993f5b3 Updating description and code snippet spacy-lefff (#2623)
* updating description and code snippet spacy-lefff

* contributors agreement
2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav 23876dbc70 Create vikaskyadav.md (#2621) 2018-08-02 14:03:44 +02:00
Vikas Kumar Yadav d3e21aad64 Update _benchmarks.jade (#2618) 2018-08-02 00:28:28 +02:00
Brian Phillips 8227de0099 Update language.jade (#2616) 2018-07-31 12:34:42 +02:00
Ioannis Daras 055cc0de44 Bug fix to pseudocode for tokenizer customization (#2604) 2018-07-27 11:04:12 +02:00
Kaisa (Katarzyna) Korsak e531a827db Changed conllu2json to be able to extract NER tags (#2594)
* extract ner tags from conllu file if available

* fixed a bug in regex
2018-07-25 22:21:31 +02:00
Dmitry Bruhanov 07d0cc9de7 Update examples.py (#2597) 2018-07-25 22:20:24 +02:00
Andriy Mulyar e9ef51137d Fixed typo (#2596)
Changed 'The index of the first character after the span.' to The index of the last character after the span' in description of doc.char_span
2018-07-25 22:17:15 +02:00
Matthew Honnibal 66983d8412
Port BenDerPan's Chinese changes to v2 (finally) (#2591)
* add  template files for Chinese

* add  template files for Chinese, and test directory .
2018-07-25 02:47:23 +02:00
ines f2e3e039b7 Update French stop words (resolves #2540) 2018-07-24 23:41:51 +02:00
kororo b1ec827ee0 Fix typo (#2579)
Update slogan, desc and code snippet to latest version
2018-07-24 22:47:33 +02:00
ines cd687091fb Remove nl examples from widget for now [ci skip]
Restore for next spaCy version when path to example sentences is fixed
2018-07-24 22:41:20 +02:00
ines 2d8ffb8bcd Fix formatting 2018-07-24 22:40:49 +02:00
ines 1b3da8d2ae Update website for v2.0.12 [ci skip] 2018-07-24 21:04:22 +02:00
Matthew Honnibal e05bebce8e Try setting appveyor to Python2 64 2018-07-24 20:47:03 +02:00
Matthew Honnibal 6303ce3d0e Try to fix memory error by moving fr_tokenizer to module scope 2018-07-24 20:09:06 +02:00
Matthew Honnibal afe3fa4449 Merge branch 'master' of https://github.com/explosion/spaCy 2018-07-24 19:44:31 +02:00
Matthew Honnibal b2e9e958b9 Add session scoping to tokenizers to try to fix oom on Appveyor 2018-07-24 19:44:18 +02:00
Ines Montani a43ad114c2
Fix typo [ci skip] 2018-07-24 18:45:40 +02:00
Dmitry Bruhanov 27160b1516 added some widespread written jargon & dialectizms (#2584)
This jargon is not offencive but emotionally colored as funny due to its deviation from the norm for various reasons: immitating a dialect, deliberately wrong spelling emphasizing its low colloquial nature, obsolete form, foreign borrowing with native flections, etc.
Dmitry Briukhanov, Linguist & Pythonist
2018-07-24 18:44:29 +02:00
Dmitry Bruhanov 4ad7de6ca9 DimaBryuhanov.md (#2590)
# spaCy contributor agreement

This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.

If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.

Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.

## Contributor Agreement

1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.

2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:

    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;

    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;

    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;

    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and

    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.

3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:

    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and

    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.

4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.

5. You covenant, represent, warrant and agree that:

    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;

    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and

    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.

6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.

7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:

    * [X] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.

    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.

## Contributor Details

| Field                          | Entry                |
|------------------------------- | -------------------- |
| Name                           |   Dmitry Briukhanov  |
| Company name (if applicable)   |           -          |
| Title or role (if applicable)  |           -          |
| Date                           |      7/24/2018       |
| GitHub username                |    DimaBryuhanov     |
| Website (optional)             |                      |
2018-07-24 18:43:27 +02:00
Matthew Honnibal 1a16162da9 Merge branch 'master' of https://github.com/explosion/spaCy 2018-07-21 15:57:18 +02:00
Matthew Honnibal cabce07ba6 Fix thinc version requirement 2018-07-21 15:56:33 +02:00
ines ae5ed2d698 Update docs for v2.0.12 [ci skip] 2018-07-21 15:51:44 +02:00
ines d517dd4297 Document remove_extension methods 2018-07-21 15:51:28 +02:00
ines 153f41a5cc Use better examples for Doc extension methods 2018-07-21 15:51:11 +02:00
Matthew Honnibal f0024e3b13 Add script to push a tag 2018-07-21 15:10:54 +02:00
Matthew Honnibal 90c269e1a9 Set about to v2.0.12 release 2018-07-21 15:09:42 +02:00
Matthew Honnibal 1a1c7304cf Set version to 2.0.12.dev1 2018-07-21 13:08:01 +02:00
Matthew Honnibal a723fafea3 Require thinc 6.10.3.dev1 2018-07-21 12:49:09 +02:00
ines 1ea881c80b Allow ignoring warnings and only overwrite if set explicitly 2018-07-20 22:50:19 +02:00
ines 95641f4026 Only install pathlib backport on Python < 3.4 2018-07-20 21:08:29 +02:00
Matthew Honnibal e0caf3ae8c Fix msgpack for new version 2018-07-20 17:32:00 +02:00
Matthew Honnibal 899f1cf442 Add regression test for issue 2179 2018-07-20 17:15:44 +02:00
Matthew Honnibal 9db77fd914 Fix deserialization for msgpack 2018-07-20 14:11:09 +02:00
Matthew Honnibal adde3826e2 Build against thinc 6.10.3.dev0 2018-07-20 13:34:54 +02:00
katarkor 5ca853bee0 changed tag_map, morph_rules, lemmatizer for Norwegian (#2565)
* changed tag_map, morph_rules, lemmatizer for Norwegian

* Move unicode declaration up

Hopefully fixes test failure on Python 2

* Update CONTRIBUTOR_AGREEMENT.md

* Move unicode declarations

Hopefully fixes test this time

* Revert "Merge remote-tracking branch 'origin/patch-1'"

This reverts commit f5ccd5dd0d, reversing
changes made to dd07e180ea.

* Update contributor agreement [ci skip]
2018-07-19 19:38:24 +02:00
kororo 2784babef9 Add ExcelCy into Universe list (#2572)
Hi guys,

This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.

## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.

### Types of change
Update to Universe list in website.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-19 19:28:33 +02:00
ines d489ffb78b Fix formatting [ci skip] 2018-07-19 13:22:25 +02:00
ines c0b62ce13c Ignore pytest cache 2018-07-19 12:30:09 +02:00
Ole Henrik Skogstrøm 6e2930a4a2 Conll(u)-bio converter (#2525)
* Started simple conllxbiluo converter

* Fix missing BIO to BILUO conversion
2018-07-18 18:55:42 +02:00