Commit Graph

14768 Commits

Author SHA1 Message Date
Ines Montani d94ddd5686
Auto-detect package dependencies in spacy package (#8948)
* Auto-detect package dependencies in spacy package

* Add simple get_third_party_dependencies test

* Import packages_distributions explicitly

* Inline packages_distributions

* Fix docstring [ci skip]

* Relax catalogue requirement

* Move importlib_metadata to spacy.compat with note

* Include license information [ci skip]
2021-08-17 14:05:13 +02:00
Sofie Van Landeghem 0a6b68848f
Fix making span_group (#8975)
* fix _make_span_group

* fix imports
2021-08-17 10:36:34 +02:00
Ines Montani 593a22cf2d
Add development docs for Language and code conventions (#8745)
* WIP: add dev docs for Language / config [ci skip]

* Add section on initialization [ci skip]

* Fix wording [ci skip]

* Add code conventions WIP [ci skip]

* Update code convention docs [ci skip]

* Update contributing guide and conventions [ci skip]

* Update Code Conventions.md [ci skip]

* Clarify sourced components + vectors

* Apply suggestions from code review [ci skip]

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update wording and add link [ci skip]

* restructure slightly + extended index

* remove paragraph that breaks flow and is repeated in more detail later

* fix anchors

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-08-17 09:38:15 +02:00
Paul O'Leary McCann 9391998c77
Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:37:21 +02:00
Ines Montani a894fe0440
Merge pull request #8951 from HLasse/master 2021-08-16 11:41:32 +10:00
Lasse 839ea0f987 change tags formatting to match 2021-08-13 14:40:08 +02:00
Lasse 70ab596f61 Merge branch 'master' of https://github.com/HLasse/spaCy 2021-08-13 14:35:21 +02:00
Lasse 195e4e48c3 add textdescriptives to universe 2021-08-13 14:35:18 +02:00
github-actions[bot] 92071326d8
Auto-format code with black (#8950)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-13 11:48:38 +02:00
Adriane Boyd 8448c7dbc5
Update da trf recommendation (#8921)
Update the da trf recommendation to the same model used in the
pretrained pipelines.
2021-08-12 13:54:02 +02:00
Ines Montani 6260f044cc
Merge pull request #8938 from explosion/docs/prodigy-v1-11-project [ci skip]
Update Prodigy project template for v1.11
2021-08-12 21:16:49 +10:00
Ines Montani 4f769ff913 Update Prodigy project template for v1.11 [ci skip] 2021-08-12 13:46:20 +10:00
Paul O'Leary McCann e227d24d43
Allow passing in array vars for speedup (#8882)
* Allow passing in array vars for speedup

This fixes #8845. Not sure about the docstring changes here...

* Update docs

Types maybe need more detail? Maybe not?

* Run prettier on docs

* Update spacy/tokens/span.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-10 15:13:53 +02:00
Paul O'Leary McCann 6029cfc391
Add scores to output in spancat (#8855)
* Add scores to output in spancat

This exposes the scores as an attribute on the SpanGroup. Includes a
basic test.

* Add basic doc note

* Vectorize score calcs

* Add "annotation format" section

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Clean up doc section

* Ran prettier on docs

* Get arrays off the gpu before iterating over them

* Remove int() calls

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-08-10 13:47:49 +02:00
Ines Montani a1e9f19460
Merge pull request #8910 from DuyguA/patch-1 [ci skip]
updated unv json for new book
2021-08-09 23:12:50 +10:00
Duygu Altinok 380b2817cf
updated unv json for new book 2021-08-09 12:39:22 +02:00
Paul O'Leary McCann cac298471f
Fix #8902 (bad link in docs)
typo fix
2021-08-08 22:04:00 +09:00
Eduard Zorita 439f30faad
Add stub files for main cython classes (#8427)
* Add stub files for main API classes

* Add contributor agreement for ezorita

* Update types for ndarray and hash()

* Fix __getitem__ and __iter__

* Add attributes of Doc and Token classes

* Overload type hints for Span.__getitem__

* Fix type hint overload for Span.__getitem__

Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>
2021-08-07 12:30:03 +02:00
github-actions[bot] 56d4d87aeb
Auto-format code with black (#8895)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-08-06 13:38:06 +02:00
Kabir Khan 1dfffe5fb4
No output info message in train (#8885)
* Add info message that no output directory was provided in train

* Update train.py

* Fix logging
2021-08-05 09:21:22 +02:00
Adriane Boyd fa2e7a4bbf
Fix spancat tests on GPU (#8872)
* Fix spancat tests on GPU

* Fix more spancat tests
2021-08-04 14:29:43 +02:00
Paul O'Leary McCann 77d698dcae
Fix check for RIGHT_ATTRS in dep matcher (#8807)
* Fix check for RIGHT_ATTRs in dep matcher

If a non-anchor node does not have RIGHT_ATTRS, the dep matcher throws
an E100, which says that non-anchor nodes must have LEFT_ID, REL_OP, and
RIGHT_ID. It specifically does not say RIGHT_ATTRS is required.

A blank RIGHT_ATTRS is also valid, and patterns with one will be
excepted. While not normal, sometimes a REL_OP is enough to specify a
non-anchor node - maybe you just want the head of another node
unconditionally, for example.

This change just sets RIGHT_ATTRS to {} if not present. Alternatively
changing E100 to state RIGHT_ATTRS is required could also be reasonable.

* Fix test

This test was written on the assumption that if `RIGHT_ATTRS` isn't
present an error will be raised. Since the proposed changes make it so
an error won't be raised this is no longer necessary.

* Revert test, update error message

Error message now lists missing keys, and RIGHT_ATTRS is required.

* Use list of required keys in error message

Also removes unused key param arg.
2021-08-04 09:20:41 +02:00
Adriane Boyd 941a591f3c
Pass excludes when serializing vocab (#8824)
* Pass excludes when serializing vocab

Additional minor bug fix:

* Deserialize vocab in `EntityLinker.from_disk`

* Add test for excluding strings on load

* Fix formatting
2021-08-03 14:42:44 +02:00
Adriane Boyd 175847f92c
Support list values and INTERSECTS in Matcher (#8784)
* Support list values and IS_INTERSECT in Matcher

* Support list values as token attributes for set operators, not just as
pattern values.

* Add `IS_INTERSECT` operator.

* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.

* Rename IS_INTERSECT to INTERSECTS
2021-08-02 19:39:26 +02:00
Adriane Boyd fbbbda1954
Fix start/end chars for empty and out-of-bounds spans (#8816) 2021-08-02 19:07:19 +02:00
Adriane Boyd 9ad3b8cf8d
Only add sourced vectors hashes to meta if necessary (#8830) 2021-08-02 18:22:35 +02:00
Nick Sorros 0485cdefcc
Add logger debug for project push and pull (#8860)
* Add logger debug for project push and pull

* Sign contributor agreement
2021-08-02 18:13:53 +02:00
themrmax de076194c4
Make ConsoleLogger flush after each logging line (#8810)
This is necessary to avoid "logging blackouts" when running training on Kubernetes pods
2021-08-02 14:33:38 +02:00
Ines Montani 30f20496d5
Merge pull request #8840 from polm/docs/evaluate-speed [ci skip] 2021-07-30 09:10:15 +10:00
Ines Montani 65d163fab5
Adjust formatting [ci skip] 2021-07-30 09:10:04 +10:00
Ines Montani 3a701d3645
Merge pull request #8841 from adrianeboyd/docs/ent-id-sep [ci skip]
Fix formatting of ent_id_sep in EntityRuler API docs
2021-07-30 09:09:25 +10:00
Ines Montani f08be084fb
Merge pull request #8844 from thomashacker/bugfix/fix-doc-transformer-typo [ci skip]
Fix typo in Tok2VecTransformer example config
2021-07-30 09:08:59 +10:00
thomashacker 02258916c8 Fix example config typo for transformer architecture 2021-07-29 11:19:40 +02:00
Adriane Boyd 15b12f3e35 Fix formatting of ent_id_sep in EntityRuler API docs 2021-07-29 10:10:12 +02:00
Paul O'Leary McCann a60cb13910 Update speed entry in metrics table 2021-07-29 16:35:19 +09:00
Paul O'Leary McCann e125313a50 Revert "Add note about SPEED in output"
This reverts commit c92d268176.
2021-07-29 16:34:08 +09:00
Ines Montani 0a1e299d30
Merge pull request #8814 from polm/docs/migrate-lexeme-tables [ci skip] 2021-07-29 17:18:02 +10:00
Paul O'Leary McCann c92d268176 Add note about SPEED in output
In #8823 it was pointed out that the `SPEED` value wasn't documented
anywhere.
2021-07-29 15:03:07 +09:00
Paul O'Leary McCann 8867e60fbb
Update website/docs/usage/v3.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-29 14:56:56 +09:00
Adriane Boyd 8547514aa4
Remove labels from textcat component config example (#8815) 2021-07-27 13:14:38 +02:00
Paul O'Leary McCann 76ac95923a Add note to migration guide about lexeme tables (fix #7290)
This just adds the resolution from #6388 to the docs.
2021-07-27 19:19:25 +09:00
Paul O'Leary McCann 67ecdcc3ac
Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Ines Montani 7f21c7dfa2
Merge pull request #8794 from explosion/autoblack
Auto-format code with black
2021-07-27 12:17:15 +10:00
Ines Montani 34c401f04f
Merge pull request #8801 from polm/fix/respect-no-skip (fixes #8796)
Respect the no_skip value
2021-07-27 12:16:47 +10:00
Ines Montani 134cb06af3
Merge pull request #8808 from kevinlu1248/master [ci skip]
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:16 +10:00
Ines Montani 9bf0d6f2fd
Merge pull request #8806 from Ledenel/master [ci skip]
fix typo
2021-07-27 12:14:22 +10:00
Kevin Lu 4a8e9e4e4e
Update data-formats.md 2021-07-25 22:58:53 -07:00
Ledenel 413f745c68 fix broken example in spaCy universe Chatterbot 2021-07-25 15:53:32 +00:00
Paul O'Leary McCann 284b530c63 Respect the no_skip value
Seems like the logic for this was just left out. See #8796.
2021-07-24 15:31:17 +09:00
explosion-bot a58ab6ea22 Auto-format code with black 2021-07-23 08:04:09 +00:00