Punitvara
b2b7e1f37a
This PR adds Gujarati Language class along with ( #5355 )
...
* This PR adds Gujarati Language class along with
- stop words
* Add test for gu tokenizer
2020-04-27 11:07:37 +02:00
sabiqueqb
fc91660aa2
Gh 5339 language class for malayalam ( #5342 )
...
* Initialize Malayalam Language class
* Add lex_attrs and examples for Malayalam
* Add spaCy Contributor Agreement
* Add test for ml tokenizer
2020-04-27 09:45:08 +02:00
Mike
481574cbc8
[minor doc change] embedding vis. link is broken in `website/docs/usage/examples.md` ( #5325 )
...
* The embedding vis. link is broken
The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?
* contributor agreement
* Update Mlawrence95.md
* Update website/docs/usage/examples.md
Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-21 20:35:12 +02:00
laszabine
fb73d4943a
Amend documentation to Language.evaluate ( #5319 )
...
* Specified usage of arguments to Language.evaluate
* Created contributor agreement
2020-04-16 20:00:18 +02:00
Jakob Jul Elben
663333c3b2
Fixes #5413 ( #5315 )
...
* Fix 5314
* Add contributor
* Resolve requested changes
Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>
2020-04-16 13:29:02 +02:00
Sébastien Harinck
dac70f29eb
contrib: add contributor agreement for user sebastienharinck ( #5316 )
2020-04-16 11:32:09 +02:00
Paolo Arduin
1ca32d8f9c
Matcher support for Span as well as Doc ( #5113 )
...
* Matcher support for Span, as well as Doc #5056
* Removes an import unused
* Signed contributors agreement
* Code optimization and better test
* Add error message for bad Matcher call argument
* Fix merging
2020-04-15 13:51:33 +02:00
Thomas Thiebaud
1eef60c658
Add spacy_fastlang to universe ( #5271 )
...
* Add spacy_fastlang to universe
* Sign SCA
2020-04-15 13:50:46 +02:00
Paolo Arduin
8ce408d2e1
Comparison predicate handling for `!=` ( #5282 )
...
* Fix #5281
* Optim test
2020-04-14 19:14:15 +02:00
Marek Grzenkowicz
6a8a52650f
[ Closes #5292 ] Fix typo in option name "--n-save_every" ( #5293 )
...
* Sign contributor agreement for chopeen
* Fix typo in option name and close #5292
2020-04-11 23:35:01 +02:00
Umar Butler
8952effcc4
Fixed Typo in Warning ( #5284 )
...
* Fixed typo in cli warning
Fixed a typo in the warning for the provision of exactly two labels, which have not been designated as binary, to textcat.
* Create and signed contributor form
2020-04-09 15:46:15 +02:00
Leander Fiedler
b63871ceff
issue5230: added contributors agreement
2020-04-06 21:04:06 +02:00
vincent d warmerdam
f329d5663a
add "whatlies" to spaCy universe ( #5252 )
...
* Add "whatlies"
We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)
* sign contributor thing
* Added fancy gif
as the image
* Update universe.json
Spellin error and spaCy clarification.
2020-04-06 11:29:30 +02:00
YohannesDatasci
beef184e53
Armenian language support ( #5246 )
...
* add Armenian language and test cases
* agreement submission
2020-04-03 13:02:18 +02:00
Michael Leichtfried
2b14997b68
Remove duplicated branch in if/else-if statement ( #5234 )
...
* Remove duplicated branch in if-elif-statement
* Add contributor agreement for leicmi
2020-04-02 14:47:42 +02:00
Jacob Lauritzen
0b76212831
Extend and fix Danish examples ( #5227 )
...
* Extend and fix Danish examples
This PR fixes two examples, adds additional examples translated from the english version, and adds punctuation.
The two changed examples are:
* "fortov" changed to "fortovet", which is more [used](https://www.google.com/search?client=firefox-b-d&sxsrf=ALeKk0143gEuPe4IbIUpzBBt-oU10OMVqA%3A1585549036477&ei=7I6BXuvJHMGOrwSqi46oCQ&q=l%C3%B8behjul+p%C3%A5+fortov&oq=l%C3%B8behjul+p%C3%A5+fortov&gs_lcp=CgZwc3ktYWIQAzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQR1DT8xZY0_MWYK_0FmgAcAZ4AIABAIgBAJIBAJgBAKABAaoBB2d3cy13aXo&sclient=psy-ab&ved=0ahUKEwjr7964xsHoAhVBx4sKHaqFA5UQ4dUDCAo&uact=5 ) and more natural. The Swedish and Norwegian examples also use this version of the word.
* "stor by" changed to "storby". In Danish we have a specific noun to describe a large, metropolitan city which is different from just describing a city as "large". In this sentence it would be much more natural to describe London as a "storby". Google even correct as search for "London stor by" to "London storby".
* Sign contrib agreement
2020-04-02 10:42:35 +02:00
Nikhil Saldanha
4f27a24f5b
Add kannada examples ( #5162 )
...
* Add example sentences for Kannada
* sign contributor agreement
2020-03-29 13:54:42 +02:00
Tom Milligan
e904958115
Limit to cupy-cuda v8, so as not to pull in v9 automatically. ( #5194 )
2020-03-29 13:52:08 +02:00
Tiljander
e53232533b
Describing priority rules for overlapping matches ( #5197 )
...
* Describing priority rules for overlapping matches
* Create Tiljander.md
* Describing priority rules for overlapping matches
* Update website/docs/api/entityruler.md
Co-Authored-By: Ines Montani <ines@ines.io>
Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 13:13:22 +01:00
Ines Montani
3fc2309c48
Merge pull request #5174 from Baciccin/master
...
Add Ligurian language
2020-03-24 16:33:59 +01:00
Philip Gillißen
128acb9ee1
Update guerda.md
2020-03-24 10:42:30 +01:00
Philip Gillißen
5d067bcc5e
Add SCA for guerda
2020-03-24 10:42:10 +01:00
Baciccin
3b53617a69
Add Ligurian language
2020-03-19 21:37:01 -07:00
Ines Montani
17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
...
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Alan Chan
1ae01684cf
Fill in contributor agreement
2020-03-15 03:45:20 +08:00
nihil
9cde7eb08c
add spacy_syllables to universe + sign contributor agreement
2020-03-13 18:09:42 +01:00
Himanshu Garg
27d1300bdb
Create merrcury.md
2020-03-10 15:11:07 +05:30
Mark Abraham
0345135167
Tokenizer to_disk and from_disk now ensure paths ( #5116 )
...
* Tokenizer to_disk and from_disk now ensure strings are converted to paths
Fixes #5115
* Sign contributor agreement
2020-03-08 13:25:56 +01:00
David Pollack
80004930ed
fix typo in svg file
2020-03-05 17:04:33 +01:00
Tom Keefe
ddf63b97a8
make idx available via to_array ( #5030 )
2020-02-22 14:13:06 +01:00
Jan Jessewitsch
c7e4fe9c5c
Fix/Improve german stop words ( #5024 )
...
* Fix german stop words
Two stop words ("einige" and "einigen") are sticking together.
Remove three nouns that may serve as stop words in a specific context (e.g. religious or news) but are not applicable for general use.
* Create Jan-711.md
2020-02-17 18:59:22 +01:00
Filip Bednárik
d4f4060bf3
Add Slovak language tools implementation ( #4943 )
...
* Add correct stopwords for Slovak language
* Add SNK Tags
* Disable formatting lint for TAGS
* Add example sentences for Slovak language
* Add slovak numerals in base form
* Add lex_attrs to sk init
* Add contributor agreement
2020-02-03 13:03:59 +01:00
Tyler Couto
9fa9d7f2cb
Fix for Issue 4665 - conllu2json ( #4953 )
...
* Fix for Issue 4665 - conllu2json
- Allowing HEAD to be an underscore
* Added contributor agreement
2020-02-03 13:01:48 +01:00
Paco Nathan
49fefb6139
Submitting `PyTextRank` for inclusion in the spaCy uniVerse ( #4942 )
...
* submitting PyTextRank for consideration of including in the spaCy uniVerse
* including SCA
2020-01-28 11:37:54 +01:00
Anastasiia Iurshina
1830a12578
Fixes typos ( #4843 )
...
* Fixes typos
* Fixes typo
* Contributor agreement
2019-12-29 14:24:13 +01:00
Ivan Echevarria
ef13e0c038
Add n_process to Language.pipe documentation ( #4842 ) [ci skip]
...
* Add n_process to documentation
* Auto-format and add default [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2019-12-29 14:23:33 +01:00
Al Johri
fd4a7bd2b7
sign contributor agreement for AlJohri ( #4839 ) [ci skip]
2019-12-29 14:17:28 +01:00
Olamilekan Wahab
a741de7cf6
Adding support for Yoruba Language ( #4614 )
...
* Adding Support for Yoruba
* test text
* Updated test string.
* Fixing encoding declaration.
* Adding encoding to stop_words.py
* Added contributor agreement and removed iranlowo.
* Added removed test files and removed iranlowo to keep project bare.
* Returned CONTRIBUTING.md to default state.
* Added delted conftest entries
* Tidy up and auto-format
* Revert CONTRIBUTING.md
Co-authored-by: Ines Montani <ines@ines.io>
2019-12-21 14:11:50 +01:00
Nicolai Bjerre Pedersen
de5453cdcb
Fix link to user hooks in docs ( #4778 )
...
* Fix link to user hooks in docs
* Update mr_bjerre.md
Mistake in contributor agreement
* Apparently hard to get it right (wrong name of sca)
2019-12-06 19:17:12 +01:00
Antti Ajanki
e626a011cc
Improvements to the Finnish language data ( #4738 )
...
* Enable lex_attrs on Finnish
* Copy the Danish tokenizer rules to Finnish
Specifically, don't break hyphenated compound words
* Contributor agreement
* A new file for Finnish tokenizer rules instead of including the Danish ones
2019-12-03 12:55:28 +01:00
Matt Maybeno
c9f1e99787
Agnostic vocab array fix ( #4680 )
...
* Use get_array_module instead of numpy
* add contributor agreement
2019-11-23 14:59:52 +01:00
GuiGel
8f7ab70870
Bugfix/fix entity ruler from disk ( #4670 )
...
* fix EntityRuler from_disk bug
* add contributor file
* Test EntityRuler PhraseMatcher deserialization (#4651 )
* newline at end of file
* fix copy paste error
* serializing the EntityRuler by itself
* Add unicode declarations for Python 2 and auto-format
2019-11-21 16:26:37 +01:00
Elijah Rippeth
5ad5c4b44a
Add initial Korean support ( #4660 )
...
* add hangul and jamo char classes.
* add initial Korean lexical attributes.
* add contributor agreement
2019-11-18 12:56:07 +01:00
Christoph Purschke
433748e867
Fix basic language support for Luxembourgish (by adding punctuation.py) ( #4648 )
...
* Update __init__.py
* Create punctuation.py
* Update tokenizer_exceptions.py
* Create questoph.md
* Update questoph.md
* Update test_text.py
* Update test_text.py
* Update test_text.py
* Update test_text.py
2019-11-15 16:16:47 +01:00
Priscilla de Abreu Lopes
39e79fcc86
Bugfix/dep matcher issue 4590 ( #4601 )
...
* add contributor agreement for prilopes
* add test for issue #4590
* fix on_match params for DependencyMacther (#4590 )
2019-11-07 12:01:06 +01:00
Neel Kamath
6c036ab57d
Add "spaCy Server" to spaCy Universe ( #4553 )
...
* Add "spaCy Server" to spaCy Universe
* Accept the spaCy Contributor Agreement
2019-10-30 13:20:46 +01:00
Ines Montani
1185702993
Port over contributor agreement from spacy-lookups-data [ci skip]
2019-10-25 13:06:10 +02:00
Zhuoru Lin
10d88b09bb
Bugfix/fix wikidata train entity linker ( #4509 )
...
* Fix labels_discard Nonetype iteration error
* Contributor agreement for Zhuoru Lin
* Enhance EntityLinker.predict() to handle labels_discard is None case.
2019-10-24 12:52:59 +02:00
gustavengstrom
050e2445a8
Adding noun_chunks to the Swedish language model (sv) ( #4422 )
...
* Create syntax_iterators.py
Replica of spacy/lang/fr/syntax_iterators.py
* Added import statements for SYNTAX_ITERATORS
* Create gustavengstrom.md
* Added "dobj" to list of labels in noun_chunks method and a test_noun_chunks method to the Swedish language model.
* Delete README-checkpoint.md
Co-authored-by: Gustav <gustav@davcon.se>
Co-authored-by: Ines Montani <ines@ines.io>
2019-10-21 12:57:06 +02:00
Pepe Berba
7772d5d3c5
Update `vocab.get_vector` docs to include features on Fasttext ngram ( #4464 )
...
* Update `vocab.get_vector`
* Added contrib agreement
2019-10-20 01:28:18 +02:00
Peter Gilles
428887b8f2
Initial commit: New language Luxembourgish (lb) ( #4424 )
...
* new language: Luxembourgish (lb)
* update
* update
* Update and rename .github/CONTRIBUTOR_AGREEMENT.md to .github/contributors/PeterGilles.md
* Update and rename .github/contributors/PeterGilles.md to .github/CONTRIBUTOR_AGREEMENT.md
* Update norm_exceptions.py
* Delete README.md
* moved test_lemma.py
* deactivated 'lemma_lookup = LOOKUP'
* update
* Update conftest.py
* update
* tests updated
* import unicode_literals
* Update spacy/tests/lang/lb/test_text.py
Co-Authored-By: Ines Montani <ines@ines.io>
* Create PeterGilles.md
2019-10-14 12:27:50 +02:00
Ben Taylor
1db79a33cb
most_similar() return the k most similar vectors ( #4364 )
...
* most_similar return n-most similar vectors
* updated most_similar comment
* add bintay contributor agreement
* sign bintay contributor agreement
* fix most_similar documentation typo
* fixed error in prune_vectors
* updated prune_vectors test
2019-10-03 14:09:44 +02:00
Rahul Soni
ed620daa5c
Fix example sentences in Hindi for grammatical errors ( #4343 )
...
* Fix grammar for hindi
* Fix grammar for hindi
* Submit contributor agreement
2019-09-30 23:32:49 +02:00
Ines Montani
159b72ed4c
Delete main.yml
2019-09-29 15:58:59 +02:00
Ines Montani
539a7b53cd
Update main.yml
2019-09-29 15:55:26 +02:00
Ines Montani
b7913c8eca
Update main.yml
2019-09-29 15:40:07 +02:00
Ines Montani
eb2b60069e
Update main.yml
2019-09-29 15:33:53 +02:00
Ines Montani
70295f9e59
Update main.yml
2019-09-29 15:32:11 +02:00
Ines Montani
b503270b09
Update main.yml
2019-09-29 15:30:31 +02:00
Ines Montani
52ea244830
Fix workflows
2019-09-29 15:30:13 +02:00
Ines Montani
e9acfaec52
Revert "Revert "Rename workflows to _workflows""
...
This reverts commit 051fac51ee
.
2019-09-29 15:29:02 +02:00
Ines Montani
051fac51ee
Revert "Rename workflows to _workflows"
...
This reverts commit ba0027c936
.
2019-09-29 15:28:59 +02:00
Ines Montani
7164c687e9
Revert "Merge branch 'master' of https://github.com/explosion/spaCy "
...
This reverts commit 41aab59dbf
, reversing
changes made to ba0027c936
.
2019-09-29 15:28:31 +02:00
Ines Montani
41aab59dbf
Merge branch 'master' of https://github.com/explosion/spaCy
2019-09-29 15:26:32 +02:00
Ines Montani
ba0027c936
Rename workflows to _workflows
2019-09-29 15:26:23 +02:00
Ines Montani
80f67f6065
Update build.yml
2019-09-29 15:24:28 +02:00
Ines Montani
e787e6d47f
Update build.yml
2019-09-29 15:15:34 +02:00
Ines Montani
b2f41e2a9b
Update build.yml
2019-09-29 15:06:19 +02:00
Ines Montani
8b02fff097
Update build.yml
2019-09-29 14:55:43 +02:00
Ines Montani
ace0d5c580
Update build.yml
2019-09-29 14:52:01 +02:00
Ines Montani
d32fb03401
Update build.yml
2019-09-29 14:48:21 +02:00
Ines Montani
a5c0130b50
Update and rename pythonpackage.yml to build.yml
2019-09-29 14:43:48 +02:00
EarlGreyT
1e9e2d8aa1
fix typo in first token ( #4327 )
...
* fix typo in first token
The head of 'in' is review which has an offset of 4 and not 44
* added contributor agreement
2019-09-27 14:49:36 +02:00
Jaydeep Borkar
6a06a3fa6a
Update stop_words.py and add name in contributors ( #4325 )
...
* Update stop_words.py and add name in contributors
* add jaydeepborkar.md in contributors directory
* Reset template [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2019-09-27 11:57:27 +02:00
Em Zhan
aafa091541
Fix typo in documentation ( #4322 )
...
* Fix typo 'probj' instead of 'pobj'
* Add spaCy contributor agreement for zqianem
2019-09-25 19:42:18 +02:00
Sean Löfgren
31c683d87d
add return_matches and as_tuples back to Matcher.pipe ( #4303 )
...
* add contributor agreement [ci skip]
* add return_matches and as_tuples back to Matcher.pipe
2019-09-18 22:00:33 +02:00
Moshe Hazoom
72463b062f
Improve speed of _merge method ( #4300 )
...
* make merge more efficient
* fix offsets
* merge works with relative indices
* remove printing
* Add the SCA
* fix SCA date
* more cythonize _retokenize.pyx
* more cythonize _retokenize.pyx
* fix only declaration in _retokenize.pyx
* switch back to absolute head
* switch back to absolute head
* fix comment
* merge from origin repo
2019-09-18 21:34:34 +02:00
tamuhey
71909cdf22
Fix iss4278 ( #4279 )
...
* fix: len(tuple) == 2
* (#4278 ) add fail test
* add contributor's aggreement
2019-09-12 10:44:49 +02:00
Mihai Gliga
25aecd504f
adding Romanian tag_map ( #4257 )
...
* adding Romanian tag_map
* added SCA file
* forgotten import
2019-09-09 11:53:09 +02:00
Ines Montani
bcd1b12f43
Add contributor agreement [ci skip]
2019-08-30 17:02:43 +02:00
Andrei-Marius Avram
199589228e
Added RONEC to spaCy Universe ( #4151 )
...
* Added RONEC to spaCy Universe
* Added contributor file
* Corrected date from .github/contributors/avramandrei.md
* Convert tabs to spaces
* Remove duplicate keys
Can only have one GitHub link unfortunately
* Also add models category
* Adjust ID
This is used to generate the URL, so a simpler string is better
2019-08-20 14:46:07 +02:00
Ivan Šarić
434f6fa6c1
Issue #1107 - adds examples.py for Croatian language ( #4143 )
...
* adds contributor agreement for isaric
* adds examples.py for croatian language
2019-08-18 23:04:41 +02:00
yanaiela
ec0beccaf1
Custom entity render ( #4117 )
...
* customizable template for entities display, allowing to pass additional parameters along each entity
* contributor agreement
* simpler naming for the additional parameters given to the span entities renderer
Co-Authored-By: Ines Montani <ines@ines.io>
* change of default parameter, as suggested
Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-16 18:39:25 +02:00
Ziming He
eea7d4f4a8
biluo_tags_from_offsets throw exception for overlapping entities ( #4021 )
...
* Check whether two entities overlap
- biluo_gold_biluo_overlap now throw exception when entities passed in have overlaps
- added unit test
* SCA agreement
2019-08-15 18:13:32 +02:00
AJ Rader
2f3648700c
Correction of default lemmatizer lookup in English (Issue # 4104) ( #4110 )
...
* pytest file for issue4104 established
* edited default lookup english lemmatizer for spun; fixes issue 4102
* eliminated parameterization and sorted dictionary dependnency in issue 4104 test
* added contributor agreement
2019-08-15 11:39:10 +02:00
Ines Montani
5196dbd89d
Delete wip.yml [ci skip]
2019-08-13 13:31:21 +02:00
Ines Montani
35c865024b
Fix file name [ci skip]
2019-08-12 18:39:54 +02:00
Ines Montani
3a39154804
Create wip.yaml [ci skip]
2019-08-12 17:26:31 +02:00
黎谢鹏
250a54414b
update lang/zh ( #4103 )
...
* update lang/zh
* update lang/zh
2019-08-12 10:37:48 +02:00
ICLR&D
87e40b17a0
Add entry for Blackstone in universe.json ( #4101 )
...
* Add entry for Blackstone in universe.json
Add an entry for the Blackstone project. Checked JSON is valid.
* Create ICLRandD.md
* Fix indentation (tabs to spaces)
It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that.
* Try to fix formatting for diff
* Fix diff
Co-authored-by: Ines Montani <ines@ines.io>
2019-08-09 17:16:51 +02:00
Jeno
15be09ceb0
Raise error if annotation dict in simple training style has unexpected keys #4074 ( #4079 )
...
* adding enhancement #4074 .
* modified behavior to strictly require top level dictionary keys - issue #4074
* pass expected keys to error message and add links as expected top level key
2019-08-06 11:01:25 +02:00
Pavle Vidanović
e1a935d71c
Stopwords for Serbian language. ( #4078 )
...
* Serbian stopwords added. (cyrillic alphabet)
* spaCy Contribution agreement included.
* Test initialize updated
2019-08-05 10:22:27 +02:00
veer-bains
874bd8c8dd
Fixed syntax error in lang/ko when using python 2 ( #4082 ) ( closes #4068 )
...
* fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py
* fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py
* Update __init__.py
* Create veer-bains.md
* Update __init__.py
fixed syntax errors in variable datatype assignment when calling spacy.blank("ko") with python 2.7
2019-08-05 10:19:32 +02:00
Anastassia
33b14724a5
Update gold corpus code to properly ingest a directory of jsonl… ( #4067 )
...
* Update gold corpus code to properly ingest a directory of jsonlines files
In response to: https://github.com/explosion/spaCy/issues/3975
* Update spacy/gold.pyx
Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-02 09:58:51 +02:00
Mohammed Daudali
23ec07debd
Correct typo for AllenAI url on homepage ( #4050 )
...
* Typo fix for AllenAI url
Changed incorrect home page url for AllenAI from appenai.org to allenai.org
* Sign contributor agreement
* Change date format
2019-07-31 00:16:33 +02:00
Bae Yong-Ju
05fbf5d976
Fix error when Korean text contains regexp special characters. ( #4022 )
2019-07-25 17:53:33 +02:00
Falak Asad
ff1e73e35c
Bugfix/issue 3968 ( #3982 )
...
* Fix for issue-3968
* Added contributor agreement
* Made suggested changes
2019-07-18 00:20:32 +02:00
pmbaumgartner
931e87f927
contributor agreement
2019-07-14 20:46:06 -04:00
yash
d5311b3c42
Add test file for issue ( #3625 ) and spacy contributor agreement
2019-07-11 14:53:14 +05:30
cedar101
58f06e6180
Korean support ( #3901 )
...
* start lang/ko
* add test codes
* using natto-py
* add test_ko_tokenizer_full_tags()
* spaCy contributor agreement
* external dependency for ko
* collections.namedtuple for python version < 3.5
* case fix
* tuple unpacking
* add jongseong(final consonant)
* apply mecab option
* Remove Pipfile for now
Co-authored-by: Ines Montani <ines@ines.io>
2019-07-09 22:23:16 +02:00