Commit Graph

395 Commits

Author SHA1 Message Date
Germán 86eb817b74 Overwrites default getter for like_num in Spanish by adding _num_words and like_num to lex_attrs.py (#3810) (closes #3803))
* (#3803) Spanish like_num returning false for number-like token

* (#3803) Spanish like_num now returning True for number-like token
2019-06-02 12:22:57 +02:00
Nirant a5d92a3035 Create NirantK.md (#3807) [ci skip] 2019-06-01 17:36:06 +02:00
Nipun Sadvilkar 1f13005751 Incorrect Token attribute ent_iob_ description (#3800)
* Incorrect Token attribute ent_iob_ description

* Add spaCy contributor agreement
2019-05-31 16:50:45 +02:00
estr4ng7d 604acb6ace Marathi Language Support (#3767)
* Adding Marathi language details and folder to it

* Adding few changes and running tests

* Adding few changes and running tests

* Update __init__.py

mh -> mr

* Rename spacy/lang/mh/__init__.py to spacy/lang/mr/__init__.py

* mh -> mr
2019-05-24 14:29:42 +02:00
Ujwal Narayan 4d550a3055 Enhancing Kannada language Resources (#3755)
* Updated stop_words.py

Added more stopwords

* Create ujwal-narayan.md

Enhancing Kannada language resources
2019-05-20 12:56:10 +02:00
Aaron Kub 719a15f23d fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
Luca Dorigo 2663f4133c Submit contributor agreement (#3705) 2019-05-10 14:19:18 +02:00
richardpaulhudson a1e07f0d14 Request to include Holmes in spaCy Universe (#3685)
* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added
2019-05-08 02:42:03 +02:00
F0rge1cE dd1e6b0bc6 Fix offset bug in loading pre-trained word2vec. (#3689)
* Fix offset bug in loading pre-trained word2vec.

* add contributor agreement
2019-05-06 23:00:38 +02:00
张晓飞 ba1ff00370 update response after calling add_pipe (#3661)
* update response after calling add_pipe

component:print_info is appened in the last, so need show it at the end of  pipeline

* Create henry860916.md
2019-05-01 12:02:18 +02:00
Amit Chaudhary 167d63af31 Fix broken link to Dive Into Python 3 website (#3656)
* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Ramiro Gómez e7e5999ddc Create yaph.md so I can contribute (#3658) 2019-04-29 19:43:06 +02:00
Brad Jascob 9afa0d6723 Update Universe Website for pyInflect (#3641) 2019-04-26 13:17:36 +02:00
Dobita21 d86848cf1f Create Dobita21.md (#3614)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-18 12:51:54 +02:00
fizban99 57d4a8bf3d Create fizban99.md (#3601) 2019-04-17 11:22:19 +02:00
BreakBB 5b8dbe4975 Fix symlink creation to show error message on failure (#3589) (resolves #3307))
* Fix symlink creation to show error message on failure. Update tests to reflect those changes.

* Fix test to succeed on non windows systems.
2019-04-16 11:58:31 +02:00
Shikhar Chauhan bbf6f9f764 Change default output format from `jsonl` to `json` for cli convert (#3583) (closes #3523)
* Changing default ouput format from jsonl to json for cli convert

* Adding Contributor Agreement
2019-04-12 11:31:23 +02:00
Omer Celik 034a1f458b Signed agreement (#3577) 2019-04-11 11:31:27 +02:00
Ivan Tham 71710e2454 Add myself to contributors (#3575) 2019-04-11 11:31:04 +02:00
Santiago Castro 86e4b68aa9 Fix website docs for Vectors.from_glove (#3565)
* Fix website docs for Vectors.from_glove

* Add myself as a contributor
2019-04-10 15:23:27 +02:00
Piero Molino 5198aa4ae6 Added Ludwig among the projects (#3548) [ci skip]
* Added Ludwig among the projects

* Create w4nderlust.md

* Add Uber to logo wall
2019-04-07 13:01:26 +02:00
jeannefukumaru f67d881b30 fix typos in tag_map flagged by `python -m debug-data` (#3542)
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-05 12:06:09 +02:00
Yves Peirsman 951825532c Improved Dutch language resources and Dutch lemmatization (#3409)
* Improved Dutch language resources and Dutch lemmatization

* Fix conftest

* Update punctuation.py

* Auto-format

* Format and fix tests

* Remove unused test file

* Re-add deleted test

* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains

* Cleaner lemmatization files
2019-04-03 14:13:26 +02:00
Kamolsit Mongkolsrisawat dcc67f3f51 Update Thai tokenizer_exception list (#3529)
* add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq

* update tokenizer_exceptions word list

* add contributor file
2019-04-03 09:13:36 +02:00
ivigamberdiev 5e5641616d Update links and http -> https (#3532)
* update links and http -> https

* SCA
2019-04-02 17:36:22 +02:00
Hiromu Hota 914b9ff3d2 Tags are joined with a comma and padded with asterisks (#3491)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

Fix a bug in the test of JapaneseTokenizer.
This PR may require @polm's review.

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

Bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-28 16:17:31 +01:00
David 74e738dd4d adds textpipe to universe (#3500) [ci skip]
* Adds textpipe to universe

* signed contributor agreement

* Adjust formatting, code style and use "standalone" category
2019-03-28 15:13:19 +01:00
Samuel Kane 06a1846379 fix(util): fix decaying function output (#3495)
* fix(util): fix decaying function output

* fix(util): better test and adhere to code standards

* fix(util): correct variable name, pytestify test, update website text
2019-03-28 13:24:47 +01:00
Wannaphong Phatthiyaphaibun 297a051992 Update Thai tag map (#3480)
* Update Thai tag map

Update Thai tag map

* Create wannaphongcom.md
2019-03-25 16:53:26 +01:00
Bharat123Rox f2547f02d6 Made changes suggested by @ines 2019-03-20 07:43:19 +05:30
Bharat123Rox b5f077dcf4 Sign the Contributor Agreement and update details 2019-03-19 23:07:54 +05:30
Ines Montani f6ffbe1fd3 Fix filename 2019-03-16 13:46:58 +01:00
Ines Montani fb53eb570f Fix typo 2019-03-16 13:45:46 +01:00
Ryan Ford 00842d7f1b Merging conversion scripts for conll formats (#3405)
* merging conllu/conll and conllubio scripts

* tabs to spaces

* removing conllubio2json from converters/__init__.py

* Move not-really-CLI tests to misc

* Add converter test using no-ud data

* Fix test I broke

* removing include_biluo parameter

* fixing read_conllx

* remove include_biluo from convert.py
2019-03-15 18:14:46 +01:00
Ines Montani e77220e3ae Merge branch 'master' into develop [ci skip] 2019-03-11 12:23:24 +01:00
Ines Montani daaeeb7a2b Merge branch 'master' into develop 2019-03-07 22:07:31 +01:00
Adrien Ball 88909a9adb Fix egg fragments in direct download (#3369)
## Description
The egg fragment in the URL must be of the form `#egg=package_name==version` instead of `#egg=package_name-version`.
One of the consequences of specifying wrong egg fragments is that `pip` does not recognize the package and its version properly, and thus it re-downloads the package systematically.

I'm not sure how this should be tested properly. 
Here is what I had before the fix when running the same direct download twice:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 1.6MB/s
  Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 919kB/s
  Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Requirement already satisfied (use --upgrade to upgrade): en-core-web-sm from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 in ./venv3/lib/python3.6/site-packages
```

And after the fix:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 1.1MB/s
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Requirement already satisfied: en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 in ./venv3/lib/python3.6/site-packages (2.0.0)
```

### Types of change
This is an enhancement as it avoids unnecessary downloads of (potentially big) spacy models, when they have already been downloaded.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-07 21:07:19 +01:00
Ines Montani a8f1efd2f5 Merge branch 'master' into develop 2019-03-07 00:56:31 +01:00
Daniel King 5f40229397 Don't use numpy directly for similarity (#3362)
* Don't use numpy directly for similarity

* Contributor agreement
2019-03-06 22:58:38 +00:00
Ines Montani 3fdcdec6a0 Merge branch 'master' into develop 2019-02-18 10:03:32 +01:00
Roshni Biswas e09f1347fa updates for Bengali language (#3286)
* Update morph_rules.py

* contributor agreement for roshni-b

* created example sentences
2019-02-18 10:02:28 +01:00
Ines Montani 0184a95340 Merge branch 'master' into develop 2019-02-12 18:29:24 +01:00
Akhilesh a78db10941 add kannada support (#3264)
* add kannada support

* add few more stop words

* add support for Kannada Language
2019-02-12 18:28:39 +01:00
Ines Montani 9e652afa4b Merge branch 'master' into develop 2019-02-08 13:28:09 +01:00
Stanisław Giziński 1448ad100c Improved polish tokenizer and stop words. (#2974)
* Improved stop words list

* Removed some wrong stop words form list

* Improved stop words list

* Removed some wrong stop words form list

* Improved Polish Tokenizer (#38)

* Add tests for polish tokenizer

* Add polish tokenizer exceptions

* Don't split any words containing hyphens

* Fix test case with wrong model answer

* Remove commented out line of code until better solution is found

* Add source srx' license

* Rename exception_list.py to match spaCy conventionality

* Add a brief explanation of where the exception list comes from

* Add newline after reach exception

* Rename COPYING.txt to LICENSE

* Delete old files

* Add header to the license

* Agreements signed

* Stanisław Giziński agreement

* Krzysztof Kowalczyk - signed agreement

* Mateusz Olko agreement

* Add DoomCoder's contributor agreement

* Improve like number checking in polish lang


* like num tests added

* all from SI system added

* Final licence and removed splitting exceptions

* Added polish stop words to LEX_ATTRA

* Add encoding info to pl tokenizer exceptions
2019-02-08 14:27:21 +11:00
Ines Montani e2d93e4852 Merge branch 'master' into develop 2019-02-07 21:10:08 +01:00
Ines Montani 18205c6c48 Update company name 2019-02-07 21:06:55 +01:00
Julia Makogon b41d64825a Ukrainian language added. Small fixes in Russian (#3241)
* Classes for Ukrainian; small fix in Russian.

* Contributor agreement
2019-02-07 21:05:11 +01:00
Ines Montani f7e4674423 Fix contributor agreement 2019-02-07 20:56:13 +01:00
Ines Montani 4684195822
Rename contributer_agreement.md to .github/contributors/lauraBaakman.md 2019-02-07 20:55:53 +01:00
Ines Montani 5d0b60999d Merge branch 'master' into develop 2019-02-07 20:54:07 +01:00
Amandine Périnet b34bc9d2e9 add small fix for French lemmatizer (#3206) 2019-01-31 23:44:10 +01:00
adrianeboyd 03d58f9feb Update TIGER/German dependency relations in documentation (#3204)
* Add missing dependency relations for TIGER/German

* Contributor agreement for adrianeboyd
2019-01-30 14:23:12 +01:00
Jo f9ca09caa0 Create PolyglotOpenstreetmap.md (#3198)
* Create PolyglotOpenstreetmap.md

* forgot to tick that box
2019-01-26 14:02:54 +01:00
foufaster 8b61b6a6b5 Create foufaster.md (#3179) 2019-01-21 15:45:54 +01:00
Björn Lennartsson b892b446cc Updates to Swedish Language (#3164)
* Added the same punctuation rules as danish language.

* Added abbreviations and also the possibility to have capitalized abbreviations on some. Added a few specific cases too

* Added test for long texts in swedish

* Added morph rules, infixes and suffixes to __init__.py for swedish

* Added some tests for prefixes, infixes and suffixes

* Added tests for lemma

* Renamed files to follow convention

* [sv] Removed ambigious abbreviations

* Added more tests for tokenizer exceptions

* Added test for problem with punctuation in issue #2578

* Contributor agreement

* Removed faulty lemmatization of 'jag' ('I') as it was lemmatized to 'jaga' ('hunt')
2019-01-16 13:45:50 +01:00
Mark Neumann e599ed9ef8 Allow vectors to be optional in init-model, more robust string counting (#3155)
* more robust init-model

* key not word

* add license agreement
2019-01-14 23:48:30 +01:00
Loghi d97661d18b Tamil language support (#3154)
Tamil language support to spaCy
Description

Hereby, creating new PR to add support for Tamil language in spaCy

    added stop words, examples and numerical attributes
    <--Working on other language data-->

Types of change

Enhancement
Checklist

    [ x] I have submitted the spaCy Contributor Agreement.
    [x ] I ran the tests, and all new and existing tests passed.
    [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-01-14 15:32:30 +01:00
Hunter Kelly f28a1c7271 Update call to `mkdir()` to create the parents (#3139)
* Update call to `mkdir()` to create the parents

- Update the call to `output_dir.mkdir()` to also create the parents if needed

* don't automatically create parents but fail fast if cannot create directory

* add signed contributors agreement for retnuh
2019-01-11 03:02:18 +01:00
Amandine Périnet ee24e2534d French lemmatization: adding lemmas for adverbs and irregular lemmas for function words (#3131)
* adding adverbs and irregular cases for empty words

* adding adverbs and irregular cases for empty words

* adding adverbs and irregular cases for empty words

* updating contributor agreement for amperinet
2019-01-10 15:41:15 +01:00
Mathieu Morey f07b577fbd Support CUDA 10 (#3126)
* ENH support CUDA 10

* Update _instructions.jade
2019-01-09 03:10:45 +01:00
Amandine Périnet eef11a7a2c French lemmatization: correcting wrong lemmas in the lookup dictionnary (#3104)
* modifying French lookup that contained wrong lemmas

* correcting wrong line breaks on hyphen

* adding contributor agreement for amperinet@

* correcting a typo
2019-01-07 14:15:19 +01:00
alvations 9972716e01 Create alvations.md (#3119) 2019-01-05 13:11:06 +01:00
Álvaro Abella Bascarán 9bc4cc1352 Fix issue 2396 (#3089)
* Test on #2396: bug in Doc.get_lca_matrix()

* reimplementation of Doc.get_lca_matrix(), (closes #2396)

* reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix()

* tests Span.get_lca_matrix() as well as Doc.get_lca_matrix()

* implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix

* use memory view instead of np.ndarray in _get_lca_matrix (faster)

* fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview

* cleaner conditional, add comment
2018-12-29 18:05:52 +01:00
Álvaro Abella Bascarán 6fe276f85d Fix issue 2396 (#3089)
* Test on #2396: bug in Doc.get_lca_matrix()

* reimplementation of Doc.get_lca_matrix(), (closes #2396)

* reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix()

* tests Span.get_lca_matrix() as well as Doc.get_lca_matrix()

* implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix

* use memory view instead of np.ndarray in _get_lca_matrix (faster)

* fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview

* cleaner conditional, add comment
2018-12-29 18:02:26 +01:00
Jari Bakken e172f2478e Add three missing tags from the `nb` tag map (#3085)
* Contributors agreement for jarib

* Add tags from the UD/NORNE dataset that is missing in the nb tag map. Relates to #3082.
2018-12-27 14:48:40 +01:00
Will Price 4a6af0852a Improve random prefix generation in displaCy arcs (#3096)
* Improve random prefix generation in displaCy arcs

* Add @willprice contributor agreement
2018-12-27 14:46:02 +01:00
Özcan Kasal b573ebca77 trilyon forgotten (#3083)
* trilyon forgotten

* contributor added
2018-12-27 14:44:23 +01:00
Ken 5f0c5fbfa4 issue #3012: add test (#3021)
* issue #3012: add test

* add contributor aggreement

* Make test work without models and fix typos

ten.pos_ instead of ten.orth_ and comparison against "10" instead of integer 10
2018-12-18 15:02:49 +01:00
Kirill Bulygin 2fb004832f Fix the first `nlp` call for `ja` (closes #2901) (#3065)
* Fix the first `nlp` call for `ja` (closes #2901)

* Add unicode declaration, formatting and use relative import
2018-12-18 15:01:06 +01:00
Kirill Bulygin 10189d9092 Fix the first `nlp` call for `ja` (closes #2901) (#3065)
* Fix the first `nlp` call for `ja` (closes #2901)

* Add unicode declaration, formatting and use relative import
2018-12-18 14:53:50 +01:00
Brixjohn 52f3c95004 Added alpha support for Tagalog language (#3062)
I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages.

I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language.

While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases.

* Added alpha support for Tagalog language

* Edited contributor template

* Included SCA; Reverted templates

* Fixed SCA template

* Fixed changes in SCA template
2018-12-18 13:08:38 +01:00
Amandine Périnet 361554f629 Lemmatization of Adjectives - French : adding rules and vocabulary (#3045)
* modifying FR lemmatisation for Adjectives

* adding contributor agreement for amperinet

* correcting some errors in vocabulary files
2018-12-16 18:11:07 +01:00
Aki Ariga 7fcd6419ff Upadate the document for Unidic link with latest version URL (#3022)
* Upadate Unidic link for latest version in document

This patch improves #3017 . The link for Unidic was old version one, so will the lates version.

* Add contributor agreement

* Use more specific link for unidic-cwj
2018-12-07 17:24:48 +01:00
Amandine Périnet 2457318b7a Lemmatization of Verbs - French : adding rules and vocabulary (#3006)
* updating rules and vocabulary for French lemmatization of verbs

* updating the file with French auxiliary verb

* updating rules and vocabulary for French lemmatization of verbs

* adding contributor agreement for amperinet

* adding rules for words with inclusive parentheses wrongly tokenized
2018-12-06 15:49:28 +01:00
Beate Sildnes f0d7e206ec Updated wordforms for Norwegian lemmatizer (#3007)
* Updated wordforms for Norwegian lemmatizer

Upload of updated lists of wordforms for the Norwegian lemmatizer (nouns, verbs, adverbs, adjectives and lookup).

* Add spaCy contributor agreement for user beatesi

*  Updated wordforms for Norwegian lemmatizer
2018-12-06 15:46:18 +01:00
Gavriel Loria ae5601beae Initialize trues to 0.0 in training example (#3004)
* added contributor agreement

* if there are no true positives, precision should be 0.0
2018-12-03 01:33:22 +01:00
wxv 06820ef6e7 Fix is_ascii documentation and create contributor file (#2988)
Proposed in #2933
2018-11-30 15:57:58 +01:00
Sofie 585de273cd Fix small typo bug in French regexp + relevant unit test (#2980)
* additional unit test for new entr word not in other lists

* bugfix - unit test works

* use _latin_lower instead of alpha_lower for french

* revert back to ALPHA_LOWER (following the code for languages)

* contributor agreement
2018-11-29 20:16:13 +01:00
Adam Schwalm 00566949de Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
2018-11-28 19:49:33 +01:00
Marc Puig 98fe1ab259 Catalan Language Support (#2940)
* Catalan language Support

* Ddding Catalan to documentation
2018-11-26 15:25:47 +01:00
Shawn Cicoria 7601ae0cff fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948

* Update spacy/compat.py

Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
2018-11-24 15:34:23 +01:00
Francisco Aranda be99f1cac5 Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement
2018-11-13 23:54:46 +01:00
mikelibg 75e7d503b7 Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation

* - added contributor info
2018-11-08 14:18:25 +01:00
Bram Vanroy 071789467e Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements

## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)

### Types of change
Documentation

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-24 15:19:17 +02:00
JKhakpour 74a30d883c Add Persian(Farsi) language support (#2797) 2018-10-13 15:31:49 +02:00
Marina Lysyuk b76fe08308 Correcting lang/ru/examples.py (#2845)
* Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement

* Correct some grammatical inaccuracies in lang\ru\examples.py

* Move contributor agreement to separate file
2018-10-13 15:19:43 +02:00
Jacopo Farina 42c42376a3 Visual C++ link updated (#2842) (closes #2841) [ci skip]
* New landing page

* Add contribution agreement
2018-10-12 14:59:45 +02:00
Przemysław Hojnacki 966b583d5e agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement

* Contributors agreement

* Contributors agreement
2018-09-27 12:25:22 +02:00
Charles-Axel Dein 94ad3c55f1 Add charlax's contributor agreement (#2805) 2018-09-27 12:24:42 +02:00
darindf 8227566805 Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way.  Use the resize function

* added spaCy Contributor Agreement
2018-09-26 21:31:03 +02:00
Keshan 9a016d17c2 Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.

* Adding contributor agreement

* Updating contributor agreement
2018-09-25 12:18:25 +02:00
John Stewart 2d15859d2a Fixed spaCy+Keras example (#2763)
* bug fixes in keras example

* created contributor agreement
2018-09-15 13:06:39 +02:00
Andrew Ongko 81564cc4e8 Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list

* add exception token

* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception

* add tokenizer exceptions list

* combining base_norms with norm_exceptions

* adding norm_exception

* fix double key in lemmatizer

* remove unused import on punctuation.py

* reformat stop_words to reduce number of lines, improve readibility

* updating tokenizer exception

* implement is_currency for lang/id

* adding orth_first_upper in tokenizer_exceptions

* update the norm_exception list

* remove bunch of abbreviations

* adding contributors file
2018-09-14 12:30:32 +02:00
Filipe Caixeta fe515085f3 Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words

* Add words to portuguese language _num_words
2018-09-14 12:30:16 +02:00
Grivaz aeba99ab0d Introduces a bulk merge function, in order to solve issue #653 (#2696)
* Fix comment

* Introduce bulk merge to increase performance on many span merges

* Sign contributor agreement

* Implement pull request suggestions
2018-09-10 16:41:42 +02:00
tyburam 476472d181 Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement

* Added polish version of english lex_attrs
2018-09-10 11:53:57 +02:00
Sainath Adapa 77139bc03c Basic support for Telugu language (#2751) 2018-09-10 11:53:18 +02:00
Maxim Kupfer 97e2874225 added contributor agreement for mbkupfer (#2738) 2018-09-10 11:32:03 +02:00
Piotr Żelasko bdb2165bd1 Less norm computations in token similarity (#2730)
* Less norm computations in token similarity

* Contributor agreement
2018-09-05 21:50:23 +02:00
Aniruddha Adhikary 4530ddcc51 update bengali token rules for hyphen and digits (#2731) 2018-09-05 21:49:00 +02:00
Nathaniel J. Smith 26849874ad When calling getoption() in conftest.py, pass a default option (#2709)
* When calling getoption() in conftest.py, pass a default option

This is necessary to allow testing an installed spacy by running:

  pytest --pyargs spacy

* Add contributor agreement
2018-09-03 09:57:52 +02:00
Arya Prabhudesai db2c2b286c Create aryaprabhudesai.md (#2681) 2018-08-20 18:56:14 +02:00
Wojciech Łukasiewicz 3953e967a0 User correct variable name in the examples (#2664)
* correct naming

* add contributor agreement
2018-08-13 22:21:24 +02:00
Aashish Gangwani 6eebfc7bf4 Added numbers to ../lang/hi/lex_attrs.py (#2629)
I have added numbers in hindi lex_attrs.py file according to Indian numbering system(https://en.wikipedia.org/wiki/Indian_numbering_system) and here are there english translations:
'शून्य' => zero 
'एक' => one
'दो' => two
'तीन' => three
 'चार' => four
'पांच' => five
'छह' => six
'सात'=>seven 
'आठ' => eight
'नौ' => nine
'दस' => ten
'ग्यारह' => eleven
'बारह' => twelve
 'तेरह' => thirteen
'चौदह' => fourteen
'पंद्रह' => fifteen
'सोलह'=> sixteen
'सत्रह' => seventeen
'अठारह' => eighteen
'उन्नीस' => nineteen
'बीस' => twenty
 'तीस' => thirty
'चालीस' => forty
'पचास' => fifty
'साठ' => sixty
'सत्तर' => seventy
'अस्सी' => eighty
'नब्बे' => ninety
'सौ' => hundred
'हज़ार' => thousand
'लाख' => hundred thousand
'करोड़' => ten million
'अरब' => billion
'खरब' => hundred billion

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-08-08 16:06:11 +02:00
Emil Stenström 3834f4146d Add abbreviations from UD_Swedish-Talbanken (#2613)
* Add abbreviations from UD_Swedish-Talbanken

* Add contributor agreement.
2018-08-07 13:53:17 +02:00
Sami dbc993f5b3 Updating description and code snippet spacy-lefff (#2623)
* updating description and code snippet spacy-lefff

* contributors agreement
2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav 23876dbc70 Create vikaskyadav.md (#2621) 2018-08-02 14:03:44 +02:00
Dmitry Bruhanov 4ad7de6ca9 DimaBryuhanov.md (#2590)
# spaCy contributor agreement

This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.

If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.

Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.

## Contributor Agreement

1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.

2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:

    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;

    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;

    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;

    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and

    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.

3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:

    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and

    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.

4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.

5. You covenant, represent, warrant and agree that:

    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;

    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and

    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.

6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.

7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:

    * [X] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.

    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.

## Contributor Details

| Field                          | Entry                |
|------------------------------- | -------------------- |
| Name                           |   Dmitry Briukhanov  |
| Company name (if applicable)   |           -          |
| Title or role (if applicable)  |           -          |
| Date                           |      7/24/2018       |
| GitHub username                |    DimaBryuhanov     |
| Website (optional)             |                      |
2018-07-24 18:43:27 +02:00
katarkor 5ca853bee0 changed tag_map, morph_rules, lemmatizer for Norwegian (#2565)
* changed tag_map, morph_rules, lemmatizer for Norwegian

* Move unicode declaration up

Hopefully fixes test failure on Python 2

* Update CONTRIBUTOR_AGREEMENT.md

* Move unicode declarations

Hopefully fixes test this time

* Revert "Merge remote-tracking branch 'origin/patch-1'"

This reverts commit f5ccd5dd0d, reversing
changes made to dd07e180ea.

* Update contributor agreement [ci skip]
2018-07-19 19:38:24 +02:00
kororo 2784babef9 Add ExcelCy into Universe list (#2572)
Hi guys,

This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.

## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.

### Types of change
Update to Universe list in website.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-19 19:28:33 +02:00
Ioannis Daras 6ed18412d0 Greek language optimizations (#2558)
* Greek language optimizations

* Add encoding on files containing greek words

* Add encoding on files containing greek words
2018-07-18 18:51:38 +02:00
Xiang Ji 19a5ef1c58 Fix venv command examples (#2560) [ci skip]
* Fix venv command examples

The documentation refers to `venv`, which is native to Python3.
However, the command examples are as if they were still `virtualenv`,
which is a package independent of `venv`:

- It doesn't need to be installed via `pip`. In fact `pip install venv` would
return an error.
- The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would
return command not found.

See https://docs.python.org/3/library/venv.html

I suspect the documentation simply replaced all occurrences of `virtualenv` with
`venv`. However they are different modules and are used differently.

* Update comment [ci skip]
2018-07-18 10:31:24 +02:00
Tero K f35980f865 Enhancement/lang fi examples (#2547)
* Added a file with examples in finnish

* added contributor agreement
2018-07-15 09:50:27 +02:00
Eleni170 6042723535 Add support for Greek language (#2535)
* Add contributor agreement

* Support for Greek language

* Fix missing el_tokenizer
2018-07-10 13:48:38 +02:00
Bùi Trung Chí 9af46b4f1b Fix loading tokenizer with custom prefix search (#2495)
* Add contributor agreement

* Fix loading tokenizer with cutom prefix search
2018-07-04 12:56:07 +02:00
Muhammad Irfan f33c703066 Add Urdu Language Support (#2430)
* added Urdu language support.

* added Urdu language tests.

* modified conftest.py for Urdu language support.

* added spacy contributor agreement.
2018-06-22 11:14:03 +02:00
himkt 14d9007efd fix wrong indexing (#2416)
* fix wrong indexing

* add agreement
2018-06-19 10:20:57 +02:00
Aliia E 428bae66b5 Add Tatar Language Support (#2444)
* add Tatar lang support

* add Tatar letters

* add Tatar tests

* sign contributor agreement

* sign contributor agreement [x]

* remove comments from Language class

* remove all template comments
2018-06-19 10:17:53 +02:00
Cory Hurst 446f5ec41b Silent keyword in info function in init (#2459)
* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue  #2196

* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue  #2196

* contributor agreement
2018-06-18 12:24:21 +02:00
Daniel Ruf d6d688914f chore: cache dependencies (#2418)
* chore: cache dependencies

* chore: add CLA
2018-06-11 00:22:41 +02:00
himkt 1a568f2e08 fix wrong documentations (#2423) 2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi d66292f767 fix UD data file extensions (#2425)
* fix UD data files extension

* add contributor agreement for msklvsk
2018-06-08 14:26:11 +02:00
Nour Shalabi a169b79092 Additions to Arabic stop words. (#2422)
* Additions to Arabic stop words.

* Create nourshalabi.md
2018-06-08 02:33:23 +02:00
Maciej c7d53348d7 Fix bug in CLI iob and ner converter (#2392) (fixes #2385)
* issue_2385 add tests for iob_to_biluo converter function

* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter

* issue_2385 add test to fix b char bug

* add contributor agreement

* fill contributor agreement
2018-05-30 12:28:44 +02:00
ansgar-t 9732988951 escape html in displacy.render (#2378) (closes #2361)
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp;amp; , &amp;lt; , &amp;gt; , &amp;quot; in before rendering svg

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361)
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
Samuel Pouyt d85494bfae Added agrement (#2374) 2018-05-26 18:19:08 +02:00
James Messinger 4515e96e90 Better formatting for `spacy train` CLI (#2357)
* Better formatting for `spacy train` CLI

Changed to use fixed-spaces rather than tabs to align table headers and data.

### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```

### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```

* Added contributor file
2018-05-25 13:08:45 +02:00
Aristo Rinjuang 432ede04af adding more words and rephrasing (#2351)
* adding more words and rephrasing

* adding a contributor

* tokenizer bugs solved
2018-05-24 11:40:57 +02:00
Shantam Raj 1a4682dd0b Update _training.jade (#2340)
* Update _training.jade

Correcting grammar. Replacing "The" with "To".

* Create armsp.md

* Update armsp.md
2018-05-21 11:09:33 +02:00
Tahar Zanouda 00417794d3 Add Arabic language (#2314)
* added support for Arabic lang

* added Arabic language support

* updated conftest
2018-05-15 00:27:19 +02:00
vishnumenon ae3719ece5 Fix the code for FACILITIY entities (#2324)
* Fix the code for FACILITIY entities

As far as I can tell, the default models all use "FAC" rather than "FACILITY"

* Added my Contributor Agreement

* Rename vishnumenon to vishnumenon.md
2018-05-12 15:19:17 +02:00
Jani Monoses 42b34832e4 Update Romanian stopword list (#2316)
* Contributor agreement for janimo

* Update Romanian stopword list

Include the correct spellings of all the words already in the repo
that are using cedillas (ş and ţ) instead of commas (ș and ț).

Add another unrelated spelling fix.

See https://github.com/stopwords-iso/stopwords-ro/pull/1 and
https://github.com/stopwords-iso/stopwords-ro/pull/2
2018-05-10 12:16:56 +02:00
Lucas Abbade 18af53014f Adding my contributor agreement (#2315)
* Create LRAbbade.md

* Update LRAbbade.md
2018-05-09 21:25:05 +02:00
mauryaland 5368ba028a Update stop_words.py for French language (#2310)
* Add contraction forms of some common stopwords

All the stopwords added contain the apostrophe" ' "or " ’ ".

* Adds contributor agreement mauryaland

* Update mauryaland.md
2018-05-09 12:04:38 +02:00
ines 37facf9b4d Add config for no-response [ci skip] 2018-05-07 22:04:54 +02:00
ines a685fff875 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-07 18:58:57 +02:00
ines e2241c797c Add lock-threads configuration [ci skip] 2018-05-07 18:54:22 +02:00
B! 414f5270b3 B Cavello's signed Contributor Agreement v2 (#2302)
This time hopefully created in the right spot. (Sorry about that!)
2018-05-07 17:48:54 +02:00
ines 929a01139a Order issue templates 2018-05-04 03:04:41 +02:00
Ines Montani 7f39c8896b
Update issue templates (#2295)
* Update issue templates

* Update templates
2018-05-04 03:02:26 +02:00
Douglas Knox 9b49a40f4e Test and fix for Issue #2219 (#2272)
Test and fix for Issue #2219: Token.similarity() failed if single letter
2018-05-03 18:40:46 +02:00
G.Pruvost cc8e804648 #2211 - Support for ssl certs config on download command (#2212)
* Add support for SSL/Certs customization on download CLI

* Add a note on SSL options for the 'download' CLI in the README

* Add contributor agreement
2018-05-03 18:37:02 +02:00
Alex Villarreal 13d562e1a4 Fix code sample for Doc.set_extension (#2282)
* Fix code sample for `set_extension`

The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.

* Contributor agreement
2018-05-02 10:16:05 +02:00
Mr Roboto 6f5ccda19c Addresses Issue #2228 - Deserialization fails when using tensor=False or sentiment=False (#2230)
* Fixes issue #2228

* Adds a new contributor
2018-05-01 13:40:22 +02:00
Shirish Kadam d98a90440f Added Adam project to spaCy Universe (#2275)
* Added 5hirish to contributors

* Added Adam Qas Project to spaCy Universe

* Remove $ from code example
2018-04-30 22:25:01 +02:00
Matt Upson 87cc6b3599 Add missing comma to NN example in docs (#2255)
Also add a completed contributor agreement.
2018-04-28 14:56:00 +02:00
Robin Linderborg d01f503b54 Remove incorrect lemma lookup gäng->gänga (#2252)
* Remove incorrect lemma lookup gäng->gänga
In modern Swedish, "gäng" is mostly associated with "gang" or "group of people". The removed lemma lookup lemmatized it to the verb "thread".

* Add contrib agreement to correct directory

* Revert change to CONTRIBUTOR_AGREEMENT
2018-04-28 14:54:41 +02:00
Jens Dahl Møllerhøj e5055e3cf6 Add Danish lemmatizer (#2184)
* add danish lemmatizer

* fill contributor agreement
2018-04-07 19:07:28 +02:00
ines 638068ec6c Restore contributor agreement 2018-03-31 14:06:37 +02:00
Suraj Rajan 1cdbb7c97c [2032] - Changed python set to cpp stl set (#2170)
Changed python set to cpp stl set #2032 

## Description

Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors.
Reference : http://www.cplusplus.com/reference/set/set/

### Types of change
Enhancement for `Vectors` for faster initialising of word vectors(fasttext)
2018-03-31 13:28:25 +02:00
Katrin Leinweber 6f84e32253 Formalise citation info (#2167)
* Create CITATION file

* Add Katrinleinweber contributor agreement
2018-03-30 10:34:14 +02:00
Viet Trung Tran ea2af94cd9 Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155)
* support for Vietnamese

* Contributor Agreement for adding Vietnamese support on spaCy
2018-03-29 12:19:51 +02:00
ines 6173c4aaa6 Port over contributor agreements 2018-03-24 17:17:37 +01:00
Aaron Marquez c7926f72eb add contributor agreement for @enerrio 2018-02-15 12:43:04 -08:00
Claudiu-Vlad Ursache cdd4b3d05c
Add contributor agreement for @ursachec 2018-02-13 20:49:42 +01:00
Johannes Dollinger 012e874d09 Add contributor agreement for emulbreh 2018-02-13 13:40:33 +01:00
Lyndon White 94ce43adf0
squashme 2018-02-09 23:19:11 +08:00
Lyndon White 5b1bc8d101
Sign contributors agreement 2018-02-09 23:14:29 +08:00
Pradeep Kumar Tippa f1911ef73a
Added pktippa contributor agreement 2018-02-07 15:37:28 +05:30
sayf eddine hammemi 35272eade8 Accept contributer agreement. 2018-02-04 20:48:45 +01:00
Adam Binford 1a2c2f7d7f Fixed auto linking after download and added simple test to check 2018-01-29 14:25:21 -05:00
Matthew Honnibal cb7110c22e
Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map
Add norwegian bokmål ('nb') lemmatizer and tag_map
2018-01-29 18:18:50 +01:00
Thomas Opsomer f35895d81b add contributor agreement 2018-01-28 20:12:05 +01:00
Ole Henrik Skogstrøm bbc758526c Added contributors agreement 2018-01-25 11:05:29 +01:00
Ali Zarezade c27c7bf0e0
add contributors.md 2018-01-23 13:47:30 +03:30
Avadh Patel 5029d65738 Signed contributor agreement
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:33:37 -06:00
Ines Montani 36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Ines Montani b52f5fb05d
Merge pull request #1830 from Babylonpartners/external-release
Signed the contributor agreement
2018-01-11 19:00:30 +00:00
Sasho Savkov 84d65873d2
Renamed the file 2018-01-11 17:49:29 +00:00
Sasho Savkov a1d2d1f263
Signed the contributor agreement
Looking forward to contributing some code :)
2018-01-11 17:46:31 +00:00
pbnsilva 78383f38a6 Adds contributor agreement 2018-01-11 17:40:12 +01:00
Kit dba6adea65
Add contributor agreement 2018-01-08 03:08:57 +01:00
Kevin Humphreys 6173b697a7 add agreement 2018-01-03 13:00:14 -08:00
zqhZY 29898946cd add contributors.md 2017-12-28 18:04:52 +08:00
Ines Montani 97f100f69f
Merge pull request #1742 from kimfalk/master
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
ines 5e5d47fe50 Add contributor agreement (see #1672) 2017-12-20 22:00:12 +01:00
Kim FalkJørgensen fc7cf85af5 agreeing to the contributor agreement. 2017-12-19 15:31:52 +01:00
Martin Andrews 67de1ad11e
Create mdda.md 2017-12-18 18:09:27 +08:00
Ines Montani 1a400ac874
Rename d99kris to d99kris.md 2017-12-17 13:44:55 +01:00
Kristofer Berggren cacdf4ad19
Add d99kris to contributors
Add myself (d99kris) to spaCy Contributor Agreement, for PR https://github.com/explosion/spaCy/pull/1731
2017-12-17 20:43:23 +08:00
Bri-Will afd9fc9d36
Adds contributor agreement for Bri-Will 2017-12-11 14:38:37 -08:00
Isaac Sijaranamual f32c6630cb Adds contributor agreement IsaacHaze 2017-12-10 23:15:06 +01:00
Ines Montani 51d3ab2137
Revert contributor agreement to empty form 2017-12-07 16:22:30 +01:00
Canbey Bilgili 86ac8ea5ba Adds Canbey Bilgili's Contributor Agreement 2017-12-01 17:27:41 +03:00
Matthew Honnibal 6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
Hugo 88d829f60c CLA 2017-11-29 10:25:20 +02:00
Vadim Mazaev 49b4e2c158 Added contributor agreement 2017-11-26 22:14:08 +03:00
Søren Lind Kristiansen b91986b726 Add contributor agreement. 2017-11-24 15:29:54 +01:00
markulrich c9b63c0dfc Use correct local parameter in example MyComponent (and added markulrich.md contributor file) 2017-11-22 15:59:08 -08:00
Burton DeWilde 833c66c9b2 Add contributor agreement 2017-11-20 11:28:31 -06:00
cclauss 31085dcbb6
Create cclauss.md 2017-11-20 14:57:30 +01:00
Felix Sonntag ada4712250 Add contributer aggreement 2017-11-19 16:30:35 +01:00
Motoki Wu 7b5b49eef0 added contributor agreement 2017-11-17 17:27:20 -08:00
Martino Mensio 239a0f391d added contributor agreement 2017-11-17 16:30:09 +01:00
Ines Montani 339675c9fb
Merge pull request #1565 from DuyguA/patch-2
added contributor agreement for DuyguA
2017-11-13 16:21:50 +01:00
Duygu Altinok c263c3acce
added contributor agreement for DuyguA 2017-11-13 15:45:13 +01:00
Abhinav Sharma 4dd34058a2
Create abhi18av.md 2017-11-13 17:23:05 +05:30