Commit Graph

14854 Commits

Author SHA1 Message Date
Sofie Van Landeghem 5e8e8525f0
fix W108 filter (#9438)
* remove text argument from W108 to enable 'once' filtering

* include the option of partial POS annotation

* fix typo

* Update spacy/errors.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-12 19:56:44 +02:00
Lj Miranda 6425b9a1c4
Include JsonlCorpus from the imports (#9431) 2021-10-12 15:39:14 +02:00
Ryn Daniels bb6623bb2d
Merge pull request #9426 from explosion/rfd-bot-config
Add allowed_teams to the explosion-bot config
2021-10-12 09:50:39 +02:00
Ryn Daniels 2fb420ec23 Add allowed_teams to the explosion-bot config 2021-10-11 18:20:48 +02:00
Ryn Daniels f64e39fa49
Install explosionbot as a github action (#9420) 2021-10-11 15:43:27 +02:00
Paul O'Leary McCann efe5beefe0
Add test for case where parser overwrite annotations (#9406)
* Add test for case where parser overwrite annotations

* Move test to its own file

Also add note about how other tokens modify results.

* Fix xfail decorator
2021-10-11 14:57:45 +02:00
Paul O'Leary McCann b53e39455e
Fix UD POS docs links (fix #9013) (#9407)
* Fix UD POS docs links (fix #9013)

The previous link seems to have been for UD v1.

* Fix link
2021-10-11 11:51:19 +02:00
Paul O'Leary McCann fd759a881b
Fix inconsistent lemmas (#9405)
* Add util function to unique lists and preserve order

* Use unique function instead of list(set())

list(set()) has the issue that it's not consistent between runs of the
Python interpreter, so order can vary.

list(set()) calls were left in a few places where they were behind calls
to sorted(). I think in this case the calls to list() can be removed,
but this commit doesn't do that.

* Use the existing pattern for this
2021-10-11 11:38:45 +02:00
Adriane Boyd fd7edbc645
Fix types descriptions of sm and sent models (#9401) 2021-10-11 11:17:18 +02:00
Adriane Boyd a5231cb044
Remove traces of lexemes from vocab serialization (#9400) 2021-10-11 11:13:35 +02:00
Jette16 3b144a3a51 Add universe test (#9278)
* Added test for universe.json

* Added contributor agreement

* Ran black on test_universe_json.py
2021-10-11 11:08:46 +02:00
Ines Montani 5003a9c3c7
Move core training logic in CLI into standalone function (#9398) 2021-10-11 10:56:14 +02:00
Paul O'Leary McCann 2a7e327310
Fix Dependency Matcher Ordering Issue (#9337)
* Fix inconsistency

This makes the failing test pass, so that behavior is consistent whether
patterns are added in one call or two.

The issue is that the hash for patterns depended on the index of the
pattern in the list of current patterns, not the list of total patterns,
so a second call would get identical match ids.

* Add illustrative test case

* Add failing test for remove case

Patterns are not removed from the internal matcher on calls to remove,
which causes spurious weird matches (or misses).

* Fix removal issue

Remove patterns from the internal matcher.

* Check that the single add call also gets no matches
2021-10-11 10:26:13 +02:00
Paul O'Leary McCann 5dbe4e8392 Update new issue config with Python 3.10 info
Also adds note that Install issues go to Discussions.
2021-10-11 15:41:32 +09:00
Paul O'Leary McCann 48ba4e60f4
Add new style citation file (#9388) 2021-10-07 17:47:39 +02:00
Sofie Van Landeghem f87ae3cb7d
Doc fixes in convert API (#9350)
* add more info on the spacy debug command

* formatting
2021-10-06 13:13:18 +09:00
Adriane Boyd 4192e71599
Sync vocab in vectors and components sourced in configs (#9335)
Since a component may reference anything in the vocab, share the full
vocab when loading source components and vectors (which will include
`strings` as of #8909).

When loading a source component from a config, save and restore the
vocab state after loading source pipelines, in particular to preserve
the original state without vectors, since `[initialize.vectors]
= null` skips rather than resets the vectors.

The vocab references are not synced for components loaded with
`Language.add_pipe(source=)` because the pipelines are already loaded
and not necessarily with the same vocab. A warning could be added in
`Language.create_pipe_from_source` that it may be necessary to save and
reload before training, but it's a rare enough case that this kind of
warning may be too noisy overall.
2021-10-04 12:19:02 +02:00
Paul O'Leary McCann 6e833b617a
Updating Troubleshooting Docs (#9329)
* Add link to Discussions FAQ

* Remove old FAQ entries

I think these are no longer relevant.

- no-cache-dir: affected pip versions are *very* old now
- narrow unicode: not an issue from py3.3+
- utf-8 osx: upstream bug closed in 2019

Some of the other issues are also maybe not frequent.
2021-10-01 12:28:22 +02:00
github-actions[bot] 42a76c758f
Auto-format code with black (#9346)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-10-01 11:17:11 +02:00
Adriane Boyd b3192ddea3
Sync thinc install dep in setup, fix test packaging (#9336)
* Sync thinc install dep in setup

* Add __init__.py to include package tests in package

* Include *.toml in package
2021-09-30 19:02:10 +02:00
Paul O'Leary McCann 78a88f7de7 Fix invalid json 2021-09-30 15:23:55 +09:00
Martin Vallone a14ab7e882
Adding PhruzzMatcher to spaCy universe (#9321)
* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
2021-09-30 13:46:53 +09:00
Adriane Boyd e750c1760c
Restore tokenization timing in Language.evaluate (#9305)
Restore tokenization timing steps that were accidentally removed in #6765.
2021-09-27 20:44:14 +02:00
Sofie Van Landeghem a361df00cd
Raise E983 early on in docbin init (#9247)
* raise E983 early on in docbin init

* catch situation before error is raised

* add more info on the spacy debug command
2021-09-27 20:43:03 +02:00
Adriane Boyd effae12cbd
Update slow readers test to use textcat_multilabel (#9300) 2021-09-27 20:04:02 +02:00
github-actions[bot] 4da2af4e0e
Auto-format code with black (#9284)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-24 10:46:43 +02:00
Ines Montani 6bb0324b81 Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
Ines Montani beb4a8c524
Merge pull request #9199 from shigapov/master (resolves #9129) 2021-09-23 19:41:53 +10:00
Philip Vollet d2adfe1efa
Add projects to spaCy Universe (#9269)
* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-23 10:56:45 +02:00
Ines Montani 57b5fc1995
Apply suggestions from code review
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-23 17:58:32 +10:00
Sofie Van Landeghem 3fc3b7a13a
avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
Daniël de Kok 17802836be
Allow overriding vars in the project assets subcommand (#9248)
This change makes the `project assets` subcommand accept variables to
override as well, making the interface more similar to `project run`.
2021-09-21 10:49:45 +02:00
Adriane Boyd 00bdb31150
Fix vector for 0-length span (#9244) 2021-09-20 20:22:49 +02:00
github-actions[bot] 015d439eb6
Auto-format code with black (#9234)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-20 08:49:19 +02:00
Edward 8bda39f088
Update Hammurabi example code to v3 (#9218)
* Update Hammurabi example code

* Fix typo
2021-09-16 13:32:44 +02:00
Paul O'Leary McCann c4f0800fb8
Validate pos values when creating Doc (#9148)
* Validate pos values when creating Doc

* Add clear error when setting invalid pos

This also changes the error language slightly.

* Fix variable name

* Update spacy/tokens/doc.pyx

* Test that setting invalid pos raises an error

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 13:28:05 +02:00
Jozef Harag 865cfbc903
feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters (#9202)
* feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters

* update versioning in docs

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-09-16 12:26:41 +02:00
Sofie Van Landeghem 00836c2d7d
Update spacy/displacy/templates.py 2021-09-16 09:23:21 +02:00
Sofie Van Landeghem 4bf2606adf
Update spacy/displacy/render.py
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-16 09:22:38 +02:00
Paul O'Leary McCann 1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Paul O'Leary McCann 9ceb8f413c
StringStore/Vocab dev docs (#9142)
* First take at StringStore/Vocab docs

Things to check:

1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?

* Update docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updates based on review feedback

* Minor fix

* Move example code down

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 12:50:22 +09:00
Ines Montani 20f63e7154
Only include runtime-relevant config in package CLI dependency detection (#9211) 2021-09-15 23:16:01 +02:00
Adriane Boyd d74870d38c
Prepare for v3.1.3 (#9200)
* Update thinc and spacy-legacy requirements

* Set version to v3.1.3
2021-09-14 11:03:51 +02:00
Renat Shigapov d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov e61d93f8c3
add NEL-visualisation to manual-usage 2021-09-13 08:38:58 +02:00
Renat Shigapov f4b5c4209d
specify kb_id and kb_url for URL visualisation 2021-09-13 08:15:07 +02:00
Renat Shigapov 7562fb5354
add links to entities into the TPL_ENT-template 2021-09-13 08:06:54 +02:00
Paul O'Leary McCann f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Renat Shigapov 646f3a54db
added spaCyOpenTapioca (#9181)
* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
2021-09-11 13:16:51 +09:00
mylibrar ee28aac68e
Update example code of forte (#9175)
Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
2021-09-11 13:13:13 +09:00