Commit Graph

319 Commits

Author SHA1 Message Date
Olamilekan Wahab a741de7cf6 Adding support for Yoruba Language (#4614)
* Adding Support for Yoruba

* test text

* Updated test string.

* Fixing encoding declaration.

* Adding encoding to stop_words.py

* Added contributor agreement and removed iranlowo.

* Added removed test files and removed iranlowo to keep project bare.

* Returned CONTRIBUTING.md to default state.

* Added delted conftest entries

* Tidy up and auto-format

* Revert CONTRIBUTING.md

Co-authored-by: Ines Montani <ines@ines.io>
2019-12-21 14:11:50 +01:00
Nicolai Bjerre Pedersen de5453cdcb Fix link to user hooks in docs (#4778)
* Fix link to user hooks in docs

* Update mr_bjerre.md

Mistake in contributor agreement

* Apparently hard to get it right (wrong name of sca)
2019-12-06 19:17:12 +01:00
Antti Ajanki e626a011cc Improvements to the Finnish language data (#4738)
* Enable lex_attrs on Finnish

* Copy the Danish tokenizer rules to Finnish

Specifically, don't break hyphenated compound words

* Contributor agreement

* A new file for Finnish tokenizer rules instead of including the Danish ones
2019-12-03 12:55:28 +01:00
Matt Maybeno c9f1e99787 Agnostic vocab array fix (#4680)
* Use get_array_module instead of numpy

* add contributor agreement
2019-11-23 14:59:52 +01:00
GuiGel 8f7ab70870 Bugfix/fix entity ruler from disk (#4670)
* fix EntityRuler from_disk bug

* add contributor file

* Test EntityRuler PhraseMatcher deserialization (#4651)

* newline at end of file

* fix copy paste error

* serializing the EntityRuler by itself

* Add unicode declarations for Python 2 and auto-format
2019-11-21 16:26:37 +01:00
Elijah Rippeth 5ad5c4b44a Add initial Korean support (#4660)
* add hangul and jamo char classes.

* add initial Korean lexical attributes.

* add contributor agreement
2019-11-18 12:56:07 +01:00
Christoph Purschke 433748e867 Fix basic language support for Luxembourgish (by adding punctuation.py) (#4648)
* Update __init__.py

* Create punctuation.py

* Update tokenizer_exceptions.py

* Create questoph.md

* Update questoph.md

* Update test_text.py

* Update test_text.py

* Update test_text.py

* Update test_text.py
2019-11-15 16:16:47 +01:00
Priscilla de Abreu Lopes 39e79fcc86 Bugfix/dep matcher issue 4590 (#4601)
* add contributor agreement for prilopes

* add test for issue #4590

* fix on_match params for DependencyMacther (#4590)
2019-11-07 12:01:06 +01:00
Neel Kamath 6c036ab57d Add "spaCy Server" to spaCy Universe (#4553)
* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement
2019-10-30 13:20:46 +01:00
Ines Montani 1185702993 Port over contributor agreement from spacy-lookups-data [ci skip] 2019-10-25 13:06:10 +02:00
Zhuoru Lin 10d88b09bb Bugfix/fix wikidata train entity linker (#4509)
* Fix labels_discard Nonetype iteration error

* Contributor agreement for Zhuoru Lin

* Enhance EntityLinker.predict() to handle labels_discard is None case.
2019-10-24 12:52:59 +02:00
gustavengstrom 050e2445a8 Adding noun_chunks to the Swedish language model (sv) (#4422)
* Create syntax_iterators.py

Replica of spacy/lang/fr/syntax_iterators.py

* Added import statements for SYNTAX_ITERATORS

* Create gustavengstrom.md

* Added "dobj" to list of labels in noun_chunks method and a test_noun_chunks method to the  Swedish language model.

* Delete README-checkpoint.md


Co-authored-by: Gustav <gustav@davcon.se>
Co-authored-by: Ines Montani <ines@ines.io>
2019-10-21 12:57:06 +02:00
Pepe Berba 7772d5d3c5 Update `vocab.get_vector` docs to include features on Fasttext ngram (#4464)
* Update `vocab.get_vector`

* Added contrib agreement
2019-10-20 01:28:18 +02:00
Peter Gilles 428887b8f2 Initial commit: New language Luxembourgish (lb) (#4424)
* new language: Luxembourgish (lb)

* update

* update

* Update and rename .github/CONTRIBUTOR_AGREEMENT.md to .github/contributors/PeterGilles.md

* Update and rename .github/contributors/PeterGilles.md to .github/CONTRIBUTOR_AGREEMENT.md

* Update norm_exceptions.py

* Delete README.md

* moved test_lemma.py

* deactivated 'lemma_lookup = LOOKUP'

* update

* Update conftest.py

* update

* tests updated

* import unicode_literals

* Update spacy/tests/lang/lb/test_text.py

Co-Authored-By: Ines Montani <ines@ines.io>

* Create PeterGilles.md
2019-10-14 12:27:50 +02:00
Ben Taylor 1db79a33cb most_similar() return the k most similar vectors (#4364)
* most_similar return n-most similar vectors

* updated most_similar comment

* add bintay contributor agreement

* sign bintay contributor agreement

* fix most_similar documentation typo

* fixed error in prune_vectors

* updated prune_vectors test
2019-10-03 14:09:44 +02:00
Rahul Soni ed620daa5c Fix example sentences in Hindi for grammatical errors (#4343)
* Fix grammar for hindi

* Fix grammar for hindi

* Submit contributor agreement
2019-09-30 23:32:49 +02:00
Ines Montani 159b72ed4c Delete main.yml 2019-09-29 15:58:59 +02:00
Ines Montani 539a7b53cd
Update main.yml 2019-09-29 15:55:26 +02:00
Ines Montani b7913c8eca
Update main.yml 2019-09-29 15:40:07 +02:00
Ines Montani eb2b60069e
Update main.yml 2019-09-29 15:33:53 +02:00
Ines Montani 70295f9e59
Update main.yml 2019-09-29 15:32:11 +02:00
Ines Montani b503270b09
Update main.yml 2019-09-29 15:30:31 +02:00
Ines Montani 52ea244830 Fix workflows 2019-09-29 15:30:13 +02:00
Ines Montani e9acfaec52 Revert "Revert "Rename workflows to _workflows""
This reverts commit 051fac51ee.
2019-09-29 15:29:02 +02:00
Ines Montani 051fac51ee Revert "Rename workflows to _workflows"
This reverts commit ba0027c936.
2019-09-29 15:28:59 +02:00
Ines Montani 7164c687e9 Revert "Merge branch 'master' of https://github.com/explosion/spaCy"
This reverts commit 41aab59dbf, reversing
changes made to ba0027c936.
2019-09-29 15:28:31 +02:00
Ines Montani 41aab59dbf Merge branch 'master' of https://github.com/explosion/spaCy 2019-09-29 15:26:32 +02:00
Ines Montani ba0027c936 Rename workflows to _workflows 2019-09-29 15:26:23 +02:00
Ines Montani 80f67f6065
Update build.yml 2019-09-29 15:24:28 +02:00
Ines Montani e787e6d47f
Update build.yml 2019-09-29 15:15:34 +02:00
Ines Montani b2f41e2a9b
Update build.yml 2019-09-29 15:06:19 +02:00
Ines Montani 8b02fff097
Update build.yml 2019-09-29 14:55:43 +02:00
Ines Montani ace0d5c580
Update build.yml 2019-09-29 14:52:01 +02:00
Ines Montani d32fb03401
Update build.yml 2019-09-29 14:48:21 +02:00
Ines Montani a5c0130b50
Update and rename pythonpackage.yml to build.yml 2019-09-29 14:43:48 +02:00
EarlGreyT 1e9e2d8aa1 fix typo in first token (#4327)
* fix typo in first token

The head of 'in' is review which has an offset of 4 and not 44

* added contributor agreement
2019-09-27 14:49:36 +02:00
Jaydeep Borkar 6a06a3fa6a Update stop_words.py and add name in contributors (#4325)
* Update stop_words.py and add name in contributors

* add jaydeepborkar.md in contributors directory

* Reset template [ci skip]


Co-authored-by: Ines Montani <ines@ines.io>
2019-09-27 11:57:27 +02:00
Em Zhan aafa091541 Fix typo in documentation (#4322)
* Fix typo 'probj' instead of 'pobj'

* Add spaCy contributor agreement for zqianem
2019-09-25 19:42:18 +02:00
Sean Löfgren 31c683d87d add return_matches and as_tuples back to Matcher.pipe (#4303)
* add contributor agreement [ci skip]

* add return_matches and as_tuples back to Matcher.pipe
2019-09-18 22:00:33 +02:00
Moshe Hazoom 72463b062f Improve speed of _merge method (#4300)
* make merge more efficient

* fix offsets

* merge works with relative indices

* remove printing

* Add the SCA

* fix SCA date

* more cythonize _retokenize.pyx

* more cythonize _retokenize.pyx

* fix only declaration in _retokenize.pyx

* switch back to absolute head

* switch back to absolute head

* fix comment

* merge from origin repo
2019-09-18 21:34:34 +02:00
tamuhey 71909cdf22 Fix iss4278 (#4279)
* fix: len(tuple) == 2

* (#4278) add fail test

* add contributor's aggreement
2019-09-12 10:44:49 +02:00
Mihai Gliga 25aecd504f adding Romanian tag_map (#4257)
* adding Romanian tag_map

* added SCA file

* forgotten import
2019-09-09 11:53:09 +02:00
Ines Montani bcd1b12f43 Add contributor agreement [ci skip] 2019-08-30 17:02:43 +02:00
Andrei-Marius Avram 199589228e Added RONEC to spaCy Universe (#4151)
* Added RONEC to spaCy Universe

* Added contributor file

* Corrected date from .github/contributors/avramandrei.md

* Convert tabs to spaces

* Remove duplicate keys

Can only have one GitHub link unfortunately

* Also add models category

* Adjust ID

This is used to generate the URL, so a simpler string is better
2019-08-20 14:46:07 +02:00
Ivan Šarić 434f6fa6c1 Issue #1107 - adds examples.py for Croatian language (#4143)
* adds contributor agreement for isaric

* adds examples.py for croatian language
2019-08-18 23:04:41 +02:00
yanaiela ec0beccaf1 Custom entity render (#4117)
* customizable template for entities display, allowing to pass additional parameters along each entity

* contributor agreement

* simpler naming for the additional parameters given to the span entities renderer

Co-Authored-By: Ines Montani <ines@ines.io>

* change of default parameter, as suggested

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-16 18:39:25 +02:00
Ziming He eea7d4f4a8 biluo_tags_from_offsets throw exception for overlapping entities (#4021)
* Check whether two entities overlap

- biluo_gold_biluo_overlap now throw exception when entities passed in have overlaps
- added unit test

* SCA agreement
2019-08-15 18:13:32 +02:00
AJ Rader 2f3648700c Correction of default lemmatizer lookup in English (Issue # 4104) (#4110)
* pytest file for issue4104 established

* edited default lookup english lemmatizer for spun; fixes issue 4102

* eliminated parameterization and sorted dictionary dependnency in issue 4104 test

* added contributor agreement
2019-08-15 11:39:10 +02:00
Ines Montani 5196dbd89d Delete wip.yml [ci skip] 2019-08-13 13:31:21 +02:00
Ines Montani 35c865024b Fix file name [ci skip] 2019-08-12 18:39:54 +02:00