Commit Graph

15176 Commits

Author SHA1 Message Date
Adriane Boyd e750c1760c
Restore tokenization timing in Language.evaluate (#9305)
Restore tokenization timing steps that were accidentally removed in #6765.
2021-09-27 20:44:14 +02:00
Sofie Van Landeghem a361df00cd
Raise E983 early on in docbin init (#9247)
* raise E983 early on in docbin init

* catch situation before error is raised

* add more info on the spacy debug command
2021-09-27 20:43:03 +02:00
Adriane Boyd effae12cbd
Update slow readers test to use textcat_multilabel (#9300) 2021-09-27 20:04:02 +02:00
Adriane Boyd fe5f5d6ac6
Update Catalan tokenizer (#9297)
* Update Makefile

For more recent python version

* updated for bsc changes

New tokenization changes

* Update test_text.py

* updating tests and requirements

* changed failed test in test/lang/ca

changed failed test in test/lang/ca

* Update .gitignore

deleted stashed changes line

* back to python 3.6 and remove transformer requirements

As per request

* Update test_exception.py

Change the test

* Update test_exception.py

Remove test print

* Update Makefile

For more recent python version

* updated for bsc changes

New tokenization changes

* updating tests and requirements

* Update requirements.txt

Removed spacy-transfromers from requirements

* Update test_exception.py

Added final punctuation to ensure consistency

* Update Makefile

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Format

* Update test to check all tokens

Co-authored-by: cayorodriguez <crodriguezp@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-27 14:42:30 +02:00
Adriane Boyd 200121a035
Merge pull request #9296 from adrianeboyd/chore/update-develop-from-master-v3.1-2
Update develop from master
2021-09-27 11:19:00 +02:00
Adriane Boyd 12ab49342c Sync requirements in setup.cfg 2021-09-27 09:16:31 +02:00
Adriane Boyd 03f234b739 Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
github-actions[bot] 4da2af4e0e
Auto-format code with black (#9284)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-24 10:46:43 +02:00
Jette16 5eced281d8
Add universe test (#9278)
* Added test for universe.json

* Added contributor agreement

* Ran black on test_universe_json.py
2021-09-23 14:31:42 +02:00
Ines Montani 6bb0324b81 Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
Ines Montani beb4a8c524
Merge pull request #9199 from shigapov/master (resolves #9129) 2021-09-23 19:41:53 +10:00
Philip Vollet d2adfe1efa
Add projects to spaCy Universe (#9269)
* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-23 10:56:45 +02:00
Ines Montani 57b5fc1995
Apply suggestions from code review
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-23 17:58:32 +10:00
Sofie Van Landeghem 3fc3b7a13a
avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
Rumesh Madhusanka 68264b4cee
Updating the stop word list for Sinhala language (#9270) 2021-09-22 20:43:42 +02:00
Adriane Boyd 2f0bb77920
Accept Doc input in pipelines (#9069)
* Accept Doc input in pipelines

Allow `Doc` input to `Language.__call__` and `Language.pipe`, which
skips `Language.make_doc` and passes the doc directly to the pipeline.

* ensure_doc helper function

* avoid running multiple processes on GPU

* Update spacy/tests/test_language.py

Co-authored-by: svlandeg <svlandeg@github.com>
2021-09-22 09:41:05 +02:00
Daniël de Kok 17802836be
Allow overriding vars in the project assets subcommand (#9248)
This change makes the `project assets` subcommand accept variables to
override as well, making the interface more similar to `project run`.
2021-09-21 10:49:45 +02:00
Adriane Boyd 00bdb31150
Fix vector for 0-length span (#9244) 2021-09-20 20:22:49 +02:00
svlandeg ec621e6853 Merge remote-tracking branch 'upstream/master' into spacy.io 2021-09-20 15:54:00 +02:00
svlandeg e0e3e9653b Revert "raise E983 early on in docbin init"
This reverts commit f3f7afa21f.
2021-09-20 15:52:02 +02:00
svlandeg f3f7afa21f raise E983 early on in docbin init 2021-09-20 15:49:31 +02:00
github-actions[bot] 015d439eb6
Auto-format code with black (#9234)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-20 08:49:19 +02:00
Edward 79c7c62970 Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:35:00 +02:00
Edward 8bda39f088
Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:32:44 +02:00
Paul O'Leary McCann c4f0800fb8
Validate pos values when creating Doc (#9148)
* Validate pos values when creating Doc

* Add clear error when setting invalid pos

This also changes the error language slightly.

* Fix variable name

* Update spacy/tokens/doc.pyx

* Test that setting invalid pos raises an error

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 13:28:05 +02:00
Jozef Harag 865cfbc903
feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters (#9202)
* feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters

* update versioning in docs

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-09-16 12:26:41 +02:00
Sofie Van Landeghem 00836c2d7d
Update spacy/displacy/templates.py 2021-09-16 09:23:21 +02:00
Sofie Van Landeghem 4bf2606adf
Update spacy/displacy/render.py
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-16 09:22:38 +02:00
Paul O'Leary McCann fd99438fb2 Make docs consistent (fix #9126) 2021-09-16 15:56:19 +09:00
Paul O'Leary McCann 1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Paul O'Leary McCann 9ceb8f413c
StringStore/Vocab dev docs (#9142)
* First take at StringStore/Vocab docs

Things to check:

1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?

* Update docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updates based on review feedback

* Minor fix

* Move example code down

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 12:50:22 +09:00
Ines Montani 20f63e7154
Only include runtime-relevant config in package CLI dependency detection (#9211) 2021-09-15 23:16:01 +02:00
Paul O'Leary McCann cd75f96501
Remove two attributes marked for removal in 3.1 (#9150)
* Remove two attributes marked for removal in 3.1

* Add back unused ints with changed names

* Change data_dir to _unused_object

This is still kept in the type definition, but I removed it from the
serialization code.

* Put serialization code back for now

Not sure how this interacts with old serialized models yet.
2021-09-15 23:07:21 +02:00
Adriane Boyd d74870d38c
Prepare for v3.1.3 (#9200)
* Update thinc and spacy-legacy requirements

* Set version to v3.1.3
2021-09-14 11:03:51 +02:00
Paul O'Leary McCann 0f01f46e02
Update Cython string types (#9143)
* Replace all basestring references with unicode

`basestring` was a compatability type introduced by Cython to make
dealing with utf-8 strings in Python2 easier. In Python3 it is
equivalent to the unicode (or str) type.

I replaced all references to basestring with unicode, since that was
used elsewhere, but we could also just replace them with str, which
shoudl also be equivalent.

All tests pass locally.

* Replace all references to unicode type with str

Since we only support python3 this is simpler.

* Remove all references to unicode type

This removes all references to the unicode type across the codebase and
replaces them with `str`, which makes it more drastic than the prior
commits. In order to make this work importing `unicode_literals` had to
be removed, and one explicit unicode literal also had to be removed (it
is unclear why this is necessary in Cython with language level 3, but
without doing it there were errors about implicit conversion).

When `unicode` is used as a type in comments it was also edited to be
`str`.

Additionally `coding: utf8` headers were removed from a few files.
2021-09-13 17:02:17 +02:00
j-frei 5d0cc0d2ab Correct parser.py use_upper param info (#9180) 2021-09-13 09:29:11 +02:00
Renat Shigapov d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov e61d93f8c3
add NEL-visualisation to manual-usage 2021-09-13 08:38:58 +02:00
Renat Shigapov f4b5c4209d
specify kb_id and kb_url for URL visualisation 2021-09-13 08:15:07 +02:00
Renat Shigapov 7562fb5354
add links to entities into the TPL_ENT-template 2021-09-13 08:06:54 +02:00
Paul O'Leary McCann 9c4e84d4a1 Minor typo fix in docs 2021-09-11 14:23:11 +09:00
Paul O'Leary McCann f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Renat Shigapov 2e2d0e8701 added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:25:25 +09:00
mylibrar d621df6422 Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:25:17 +09:00
Renat Shigapov 646f3a54db
added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:16:51 +09:00
mylibrar ee28aac68e
Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:13:13 +09:00
j-frei 462b009648
Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
Renat Shigapov c1927fe994
fix misprint in tags 2021-09-09 15:37:34 +02:00
Renat Shigapov 8940e0baca
add agreement 2021-09-09 15:33:29 +02:00
Renat Shigapov ea58294076
add spaCyOpenTapioca to universe 2021-09-09 15:13:18 +02:00