Commit Graph

8810 Commits

Author SHA1 Message Date
Sofie Van Landeghem 5e8e8525f0
fix W108 filter (#9438)
* remove text argument from W108 to enable 'once' filtering

* include the option of partial POS annotation

* fix typo

* Update spacy/errors.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-12 19:56:44 +02:00
Lj Miranda 6425b9a1c4
Include JsonlCorpus from the imports (#9431) 2021-10-12 15:39:14 +02:00
Paul O'Leary McCann efe5beefe0
Add test for case where parser overwrite annotations (#9406)
* Add test for case where parser overwrite annotations

* Move test to its own file

Also add note about how other tokens modify results.

* Fix xfail decorator
2021-10-11 14:57:45 +02:00
Paul O'Leary McCann fd759a881b
Fix inconsistent lemmas (#9405)
* Add util function to unique lists and preserve order

* Use unique function instead of list(set())

list(set()) has the issue that it's not consistent between runs of the
Python interpreter, so order can vary.

list(set()) calls were left in a few places where they were behind calls
to sorted(). I think in this case the calls to list() can be removed,
but this commit doesn't do that.

* Use the existing pattern for this
2021-10-11 11:38:45 +02:00
Adriane Boyd a5231cb044
Remove traces of lexemes from vocab serialization (#9400) 2021-10-11 11:13:35 +02:00
Jette16 3b144a3a51 Add universe test (#9278)
* Added test for universe.json

* Added contributor agreement

* Ran black on test_universe_json.py
2021-10-11 11:08:46 +02:00
Ines Montani 5003a9c3c7
Move core training logic in CLI into standalone function (#9398) 2021-10-11 10:56:14 +02:00
Paul O'Leary McCann 2a7e327310
Fix Dependency Matcher Ordering Issue (#9337)
* Fix inconsistency

This makes the failing test pass, so that behavior is consistent whether
patterns are added in one call or two.

The issue is that the hash for patterns depended on the index of the
pattern in the list of current patterns, not the list of total patterns,
so a second call would get identical match ids.

* Add illustrative test case

* Add failing test for remove case

Patterns are not removed from the internal matcher on calls to remove,
which causes spurious weird matches (or misses).

* Fix removal issue

Remove patterns from the internal matcher.

* Check that the single add call also gets no matches
2021-10-11 10:26:13 +02:00
Adriane Boyd 4192e71599
Sync vocab in vectors and components sourced in configs (#9335)
Since a component may reference anything in the vocab, share the full
vocab when loading source components and vectors (which will include
`strings` as of #8909).

When loading a source component from a config, save and restore the
vocab state after loading source pipelines, in particular to preserve
the original state without vectors, since `[initialize.vectors]
= null` skips rather than resets the vectors.

The vocab references are not synced for components loaded with
`Language.add_pipe(source=)` because the pipelines are already loaded
and not necessarily with the same vocab. A warning could be added in
`Language.create_pipe_from_source` that it may be necessary to save and
reload before training, but it's a rare enough case that this kind of
warning may be too noisy overall.
2021-10-04 12:19:02 +02:00
github-actions[bot] 42a76c758f
Auto-format code with black (#9346)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-10-01 11:17:11 +02:00
Adriane Boyd b3192ddea3
Sync thinc install dep in setup, fix test packaging (#9336)
* Sync thinc install dep in setup

* Add __init__.py to include package tests in package

* Include *.toml in package
2021-09-30 19:02:10 +02:00
Adriane Boyd e750c1760c
Restore tokenization timing in Language.evaluate (#9305)
Restore tokenization timing steps that were accidentally removed in #6765.
2021-09-27 20:44:14 +02:00
Sofie Van Landeghem a361df00cd
Raise E983 early on in docbin init (#9247)
* raise E983 early on in docbin init

* catch situation before error is raised

* add more info on the spacy debug command
2021-09-27 20:43:03 +02:00
Adriane Boyd effae12cbd
Update slow readers test to use textcat_multilabel (#9300) 2021-09-27 20:04:02 +02:00
github-actions[bot] 4da2af4e0e
Auto-format code with black (#9284)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-24 10:46:43 +02:00
Ines Montani 6bb0324b81 Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
Ines Montani beb4a8c524
Merge pull request #9199 from shigapov/master (resolves #9129) 2021-09-23 19:41:53 +10:00
Ines Montani 57b5fc1995
Apply suggestions from code review
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-23 17:58:32 +10:00
Sofie Van Landeghem 3fc3b7a13a
avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
Daniël de Kok 17802836be
Allow overriding vars in the project assets subcommand (#9248)
This change makes the `project assets` subcommand accept variables to
override as well, making the interface more similar to `project run`.
2021-09-21 10:49:45 +02:00
Adriane Boyd 00bdb31150
Fix vector for 0-length span (#9244) 2021-09-20 20:22:49 +02:00
github-actions[bot] 015d439eb6
Auto-format code with black (#9234)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-20 08:49:19 +02:00
Paul O'Leary McCann c4f0800fb8
Validate pos values when creating Doc (#9148)
* Validate pos values when creating Doc

* Add clear error when setting invalid pos

This also changes the error language slightly.

* Fix variable name

* Update spacy/tokens/doc.pyx

* Test that setting invalid pos raises an error

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-16 13:28:05 +02:00
Jozef Harag 865cfbc903
feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters (#9202)
* feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters

* update versioning in docs

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-09-16 12:26:41 +02:00
Sofie Van Landeghem 00836c2d7d
Update spacy/displacy/templates.py 2021-09-16 09:23:21 +02:00
Sofie Van Landeghem 4bf2606adf
Update spacy/displacy/render.py
Co-authored-by: Renat Shigapov <57352291+shigapov@users.noreply.github.com>
2021-09-16 09:22:38 +02:00
Ines Montani 20f63e7154
Only include runtime-relevant config in package CLI dependency detection (#9211) 2021-09-15 23:16:01 +02:00
Adriane Boyd d74870d38c
Prepare for v3.1.3 (#9200)
* Update thinc and spacy-legacy requirements

* Set version to v3.1.3
2021-09-14 11:03:51 +02:00
Renat Shigapov d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov f4b5c4209d
specify kb_id and kb_url for URL visualisation 2021-09-13 08:15:07 +02:00
Renat Shigapov 7562fb5354
add links to entities into the TPL_ENT-template 2021-09-13 08:06:54 +02:00
j-frei 462b009648
Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
Adriane Boyd aba6ce3a43
Handle spacy-legacy in package CLI for dependencies (#9163)
* Handle spacy-legacy in package CLI for dependencies

* Implement legacy backoff in spacy registry.find

* Remove unused import

* Update and format test
2021-09-08 11:46:40 +02:00
github-actions[bot] 584fae5807
Auto-format code with black (#9130)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-09-03 10:47:03 +02:00
Kevin Humphreys ca93504660
Pass alignments to Matcher callbacks (#9001)
* pass alignments to callbacks

* refactor for single callback loop

* Update spacy/matcher/matcher.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-02 12:58:05 +02:00
Sofie Van Landeghem 8895e3c9ad
matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:26:33 +02:00
Robyn Speer d60b748e3c
Fix surprises when asking for the root of a git repo (#9074)
* Fix surprises when asking for the root of a git repo

In the case of the first asset I wanted to get from git, the data I
wanted was the entire repository. I tried leaving "path" blank, which
gave a less-than-helpful error, and then I tried `path: "/"`, which
started copying my entire filesystem into the project. The path I should
have used was "".

I've made two changes to make this smoother for others:

- The 'path' within a git clone defaults to ""
- If the path points outside of the tmpdir that the git clone goes
into, we fail with an error

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* use a descriptive error instead of a default

plus some minor fixes from PR review

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* check for None values in assets

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
2021-09-01 22:52:08 +02:00
Paul O'Leary McCann f803a84571
Fix inference of epoch_resume (#9084)
* Fix inference of epoch_resume

When an epoch_resume value is not specified individually, it can often
be inferred from the filename. The value inference code was there but
the value wasn't passed back to the training loop.

This also adds a specific error in the case where no epoch_resume value
is provided and it can't be inferred from the filename.

* Add new error

* Always use the epoch resume value if specified

Before this the value in the filename was used if found
2021-09-01 14:17:42 +09:00
Adriane Boyd 1e9b4b55ee
Pass overrides to subcommands in workflows (#9059)
* Pass overrides to subcommands in workflows

* Add missing docstring
2021-08-30 09:23:54 +02:00
Sofie Van Landeghem 1e974de837
config is not Optional (#9024) 2021-08-27 11:44:31 +02:00
github-actions[bot] fb9c31fbda
Auto-format code with black (#9065)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-27 11:42:27 +02:00
Sofie Van Landeghem 4d39430b82
Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:50:35 +02:00
Sofie Van Landeghem 94fb840443
fix docs for Span constructor arguments (#9023) 2021-08-25 16:06:22 +02:00
David Strouk 31e9b126a0
Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033) 2021-08-25 15:55:09 +02:00
Ines Montani 4cd052e81d
Include component factories in third-party dependencies resolver (#9009)
* Include component factories in third-party dependencies resolver

* Increment catalogue and update test
2021-08-25 14:58:01 +02:00
Sofie Van Landeghem e1f88de729
bump to 3.1.2 (#9008) 2021-08-20 12:41:09 +02:00
Sofie Van Landeghem 4d52d7051c
Fix spancat training on nested entities (#9007)
* overfitting test on non-overlapping entities

* add failing overfitting test for overlapping entities

* failing test for list comprehension

* remove test that was put in separate PR

* bugfix

* cleanup
2021-08-20 12:37:50 +02:00
Paul O'Leary McCann 9cc3dc2b67
Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00
Sofie Van Landeghem de025beb5f
Warn and document spangroup.doc weakref (#8980)
* test for error after Doc has been garbage collected

* warn about using a SpanGroup when the Doc has been garbage collected

* add warning to the docs

* rephrase slightly

* raise error instead of warning

* update

* move warning to doc property
2021-08-20 11:06:19 +02:00
Adriane Boyd 6722dc3dc5
Fix allow_overlap default for spancat scoring (#8970)
* Remove irrelevant default options
2021-08-18 09:56:56 +02:00