Commit Graph

14728 Commits

Author SHA1 Message Date
Adriane Boyd 8547514aa4
Remove labels from textcat component config example (#8815) 2021-07-27 13:14:38 +02:00
Paul O'Leary McCann 67ecdcc3ac
Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Ines Montani 7f21c7dfa2
Merge pull request #8794 from explosion/autoblack
Auto-format code with black
2021-07-27 12:17:15 +10:00
Ines Montani 34c401f04f
Merge pull request #8801 from polm/fix/respect-no-skip (fixes #8796)
Respect the no_skip value
2021-07-27 12:16:47 +10:00
Ines Montani 134cb06af3
Merge pull request #8808 from kevinlu1248/master [ci skip]
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:16 +10:00
Ines Montani 9bf0d6f2fd
Merge pull request #8806 from Ledenel/master [ci skip]
fix typo
2021-07-27 12:14:22 +10:00
Kevin Lu 4a8e9e4e4e
Update data-formats.md 2021-07-25 22:58:53 -07:00
Ledenel 413f745c68 fix broken example in spaCy universe Chatterbot 2021-07-25 15:53:32 +00:00
Paul O'Leary McCann 284b530c63 Respect the no_skip value
Seems like the logic for this was just left out. See #8796.
2021-07-24 15:31:17 +09:00
explosion-bot a58ab6ea22 Auto-format code with black 2021-07-23 08:04:09 +00:00
Adriane Boyd 6bbc2b1956
Reload train corpus in debug data after initialize (#8776) 2021-07-21 22:38:40 +02:00
Adriane Boyd d48c01a6f7
Remove extraneous grc test file (#8768) 2021-07-20 15:51:15 +02:00
Sofie Van Landeghem ffaead8fe0
bump to 3.1.1 2021-07-19 14:48:27 +02:00
Sofie Van Landeghem 83e27d262e
negative tag annotation (#8731)
* unit test to unlearn tag via negative annotation

* bump thinc to 8.0.8
2021-07-19 14:39:11 +02:00
Adriane Boyd 0e4b96c97e
Update lexeme ranks for loaded vectors (#8640)
Update the ranks for any lexemes that have been added to the vocab
before the vectors are added to the model.
2021-07-19 18:25:54 +10:00
Adriane Boyd e532c69475
Update Language.replace_pipe for disabled components (#8729)
* Fix the index where the replacement in inserted to account for
disabled components
* Allow `Language.replace_pipe` to replace disabled components
2021-07-19 18:06:12 +10:00
Paul O'Leary McCann d717593eb7
Merge pull request #8754 from KennethEnevoldsen/patch-1
[minor] removed outdated spacy version for spacymoji
2021-07-18 19:17:33 +09:00
Paul O'Leary McCann ac67639eaf
Merge pull request #8755 from KennethEnevoldsen/patch-2
fixed GitHub link and thumbnail
2021-07-18 19:14:57 +09:00
Kenneth Enevoldsen 5d6aed0773
fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 10:22:00 +02:00
Ines Montani f90482d077 Tidy up and auto-format 2021-07-18 15:44:56 +10:00
Ines Montani 313f55e560 Fix JSON [ci skip] 2021-07-18 13:21:33 +10:00
Ines Montani 51e5903d6f
Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:18:42 +10:00
Kenneth Enevoldsen 8546948fba
removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-17 15:19:43 +02:00
Kenneth Enevoldsen a0e0ccdb46
Update website/meta/universe.json
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-17 07:14:46 +02:00
Ines Montani c0f436efbc
Merge pull request #8735 from explosion/autoblack 2021-07-17 13:46:17 +10:00
Ines Montani 483f3175cb Tidy up [ci skip] 2021-07-17 13:43:15 +10:00
Ines Montani 15e6578f7d
Adjust formatting 2021-07-17 10:49:13 +10:00
Mario Šaško 1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
explosion-bot eff3d1088b Auto-format code with black 2021-07-16 08:03:36 +00:00
Adriane Boyd f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:41:36 +02:00
Adriane Boyd ac45c7c045
Add pre-commit to ignored requirements (#8728) 2021-07-15 16:41:15 +02:00
jmyerston 993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Sofie Van Landeghem 77859beb99
spacy.ngram_range_suggester.v1 (#8699) 2021-07-15 10:01:22 +02:00
Julien Rossi e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Ines Montani 2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00
KennethEnevoldsen e5127992a0 added agreement 2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen 94ce904e10
added missing comma 2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen a81fcc81b0
added dacy to universe 2021-07-13 09:54:08 +02:00
Adriane Boyd f9fd2889b7
Use 0-vector for OOV lexemes (#8639) 2021-07-13 14:48:12 +10:00
Edward 8233359225
Fix preservation of spacy package meta (#8663)
* update package meta with existing_meta and nlp_meta

* Add spaCy contributor agreement

* Added more info when creating readme
2021-07-12 11:18:52 +02:00
Paul O'Leary McCann 1c70c87daf
Fix autoblack
The conditional needs double equals.
2021-07-10 16:02:39 +09:00
Ines Montani 616f4de034
Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip]
Make the autoblack job not run on forks
2021-07-10 16:41:59 +10:00
Paul O'Leary McCann b8cdbb4bb6 Make the autoblack job not run on forks
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.

The way to make the job not run on forks is a little non-obvious but
based on this thread.

https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani d4fecdfb82
Merge pull request #8665 from rynoV/patch-1 [ci skip] 2021-07-10 10:52:15 +10:00
Ines Montani 50000d37e4
Avoid double parentheses [ci skip] 2021-07-10 10:52:01 +10:00
Calum Sieppert e2d53aa1a6
Typo fixes 2021-07-09 10:25:56 -06:00
Adriane Boyd d8805a1073
Fix ru/uk lemmatizer mp with spawn (#8657)
Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.
2021-07-09 15:36:56 +02:00
Adriane Boyd b8e720fdb9
Fix Azerbaijani init, extend lang init tests (#8656)
* Extend langs in initialize tests

* Fix az init
2021-07-09 15:36:35 +02:00
Ines Montani 1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit 2021-07-09 23:09:24 +10:00