Commit Graph

8109 Commits

Author SHA1 Message Date
Rohit Gupta f4ca5623d2
Make checkpointing on train epoch end condition dynamic (#15300)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-09 14:27:53 +00:00
Akihiro Nitta a00dfc850d
CI: Use new syntax for setting github output (#15415)
Use new syntax for setting github output

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-11-09 14:07:52 +00:00
Adrian Wälchli 3dcb85a85a
Undo marking tests that don't need it as standalone (#15355)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-09 14:03:25 +00:00
Ethan Harris 733695d037
[App] Add `start_with_flow` flag to works (#15591)
* Initial commit

* Update cloud runner

* Add `start_with_flow` flag

* Update CHANGELOG.md

* Update src/lightning_app/core/work.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update cloud runner

* Revert, not needed

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-09 08:54:22 -05:00
dependabot[bot] fc78d8d6e5
Update fastapi requirement from <0.83.0 to <0.87.0 in /requirements (#15564)
Updates the requirements on [fastapi](https://github.com/tiangolo/fastapi) to permit the latest version.
- [Release notes](https://github.com/tiangolo/fastapi/releases)
- [Commits](https://github.com/tiangolo/fastapi/compare/0.1.11...0.86.0)

---
updated-dependencies:
- dependency-name: fastapi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-09 08:31:43 +00:00
David Gilbertson b04a7aab9c
Docs: Update tutorial to match PyTorchProfiler changes (#15440)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-09 01:28:05 +01:00
Adrian Wälchli f2449ac5ab
Fix wandb test writing artifacts to cwd (#15551) 2022-11-08 19:13:49 +00:00
Carlos Mocholí e33d09a1a8
Reuse assistant code to create the mirror package (#15365) 2022-11-08 18:52:24 +00:00
Kaszanas 35b66fd890
Fixed typo Havana -> Habana for HPUs (#15589)
Fixed typo Havana -> Habana

HPUs are accelerators built by Habana Labs.
2022-11-08 19:09:54 +01:00
Rohit Gupta 1a8f2e8516
Support DDP with LRFinder (#15304)
* Support DDP for LRFinder
* Apply suggestions from code review
* rank 0 is the decision maker

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-08 17:55:15 +00:00
Lorenzo Collodi 750f62f6c3
Fix bagua strategy raising `AttributeError` during manual optimization (#13137)
* fix: fix bagua manual backward
* Update bagua module
* Simplify test case
* Fix type annotations

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <akihiro@pytorchlightning.ai>
2022-11-08 17:46:15 +00:00
Mansy 3247054067
Enable quick start e2e test again and run on cloud without installing dependencies 😎 (#15546)
* Enable quick start e2e test to run without installing dependencies
* yaml formatting
* clone the repo
2022-11-08 17:47:38 +01:00
Adrian Wälchli bc2cf451ee
Checkpoint migration for loop's internal state (#15500) 2022-11-08 16:17:05 +00:00
Jirka Borovec 175603ca3f
Merge the slow and regular test workflows (#15331)
* extend matrix

* drop

* group-check

* groups

* cat

* typo

* cat2type

* cat2type

* env vars

* ''

* Rename to slow. Fix timeout

* Examples are GPU only

* str

* Extra step

* ''

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-08 10:00:08 -05:00
Jirka Borovec d5003b1c07
prune installation artifact (#15558)
* prune installation artifact

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-08 09:54:38 -05:00
Adrian Wälchli e4611ef98e
Support fused Adam with mixed precision (#15555)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-08 09:53:30 -05:00
Carlos Mocholí bf4653e421
Simplify codeowners (#15585)
* Simplify PL CODEOWNERS

* Add William

* Update apps too
2022-11-08 09:08:28 -05:00
Carlos Mocholí 5bec46ddce
Remove MANIFEST reference in docs job (#15584) 2022-11-08 14:32:25 +01:00
fabio fumarola 5cd8001240
Update docs for `seed_everything` in LightningCLI (#15308)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-08 13:29:30 +00:00
Rohit Gupta 1cd66b6d7c
Add test to verify that lowering gpus on restart works with sharded spawn (#15317)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-11-08 13:03:29 +00:00
thomas chaton f9a65731cd
[App] Expose Run Work Executor (#15561) 2022-11-08 12:55:31 +00:00
Carlos Mocholí 3ea8903a32
Fix which groups require docs builds (#15581) 2022-11-08 13:14:17 +01:00
Adrian Wälchli 7767fd36b6
Fix result transfer in multiprocessing launcher on multi-node (#15567)
* Fix result transfer in multiprocessing launcher on multi-node

* add simple test

* add comment

* update test

* changelog

* use tempfile

* fix

* assert None

* unused import

* add comment
2022-11-08 13:07:58 +01:00
Rohit Gupta 0886e6352e
Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-08 02:10:35 +00:00
Bryn Lloyd 18f7f2d395
Use _PATH in annotations and convert to str if Path (#15560)
Co-authored-by: Bryn Lloyd <lloyd@itis.swiss>
2022-11-07 21:04:37 -05:00
Luca Antiga 01f57a9cbd
Reuse existing commands when running connect more than once (#15471)
* Reuse connection if it matches a connection from an active terminal
* Remove unused import
* Include both name and id in the check
* Fix messages and tests
* Add test
* Handle monkeypatching more cleanly
* Remove unused imports

Co-authored-by: Luca Antiga <luca@lightning.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-07 20:01:35 +00:00
Sherin Thomas 94c300c2eb
[App] Re-wording build config warning in the docs (#15570)
* build config commands
* Apply suggestions from code review
2022-11-07 21:00:54 +01:00
moghadas76 d5ffdfac2a
Fix: Revert lightning_lite.utilities.rank_zero_only to preserve backward compatibility (#15536)
* Fix: Revert  to preserve backward compatibility

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-07 19:23:30 +00:00
Carlos Mocholí 04e1e925da Update governance and codeowners 2022-11-07 11:12:24 -05:00
Kushashwa Ravi Shrimali a557952fab
Move `krshrimali` to Alumni (#15568)
Move myself to Alumni
2022-11-07 09:13:00 -05:00
Akihiro Nitta 8d19297642
Revert "Fix PL docs build on readthedocs.org (#15511)" (#15565)
This reverts commit e818e823e3.
2022-11-07 11:10:02 +01:00
Akihiro Nitta e03809f881
Fix `tests_pytorch` import error in legacy checkpoint CI (#15566)
Fix tests_pytorch import error
2022-11-07 11:09:25 +01:00
thomas chaton 820233176b
[App] Fixed Multi Node and add examples (#15557) 2022-11-07 09:36:41 +00:00
Adrian Wälchli 96c574425d
Update path type annotation for load_from_checkpoint (#15540) 2022-11-07 08:27:07 +00:00
dependabot[bot] df64dad58c
Bump pytest-cov from 3.0.0 to 4.0.0 in /requirements (#15563)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 3.0.0 to 4.0.0.
- [Release notes](https://github.com/pytest-dev/pytest-cov/releases)
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v3.0.0...v4.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-07 08:19:01 +00:00
dependabot[bot] f492277bfc
Update uvicorn requirement from <=0.18.2 to <0.19.1 in /requirements (#15562)
Updates the requirements on [uvicorn](https://github.com/encode/uvicorn) to permit the latest version.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/CHANGELOG.md)
- [Commits](https://github.com/encode/uvicorn/compare/0.0.1...0.19.0)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-07 08:17:07 +00:00
Raphael Randschau 5ff610cbea
Add basic SSH documentation for CLI (#15316)
* add basic ssh documentation

* rename workflow ssh debugging

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* add more details about ssh command

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* add more motivation to the audience section

* fix sphinx errors

* Update docs/source-app/workflows/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* add details how to get app id

* add docs about component name

* add more context to the audience section

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* address adrians comment about order

* add one-time notice

* fix headers

* wording

* update to match ssh params

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* Update docs/source-app/workflows/ssh/index.rst

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

* drop verification

* fix merge conflict error

* remove symlink

* fix doctree

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-06 07:07:50 -08:00
William Falcon 877c0bfe2c
Docs 3/n (#15554)
* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* remove source-lit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-06 05:55:57 -05:00
geoffrey-g-delhomme 7bdfced27c
Let metadata `score` be serializable by wand (#15544)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 14:51:49 +00:00
Carlos Mocholí 12d6e44796
Grep for potential errors in standalone tests (#15341)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 04:29:38 +01:00
Carlos Mocholí 7a8cf4eb10
Print the logs when TPU tests fail (#15533) 2022-11-05 03:08:22 +01:00
thomas chaton d48aa03207
Slightly safer multi node (#15538)
update

Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2022-11-04 22:05:11 -04:00
Adrian Wälchli dcfaa065ab
Improve the checkpoint upgrade utility script (#15333) 2022-11-04 21:41:32 +00:00
Bryn Lloyd fe8488d2a7
load_from_checkpoint returns the expected type (#15496)
Co-authored-by: Bryn Lloyd <lloyd@itis.swiss>
2022-11-04 21:00:45 +00:00
Yuxuan Lu ee8a57da0f
Fix usage of fs.listdir in CheckpointConnector (#15413)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-11-04 20:21:52 +00:00
Adrian Wälchli 62d040c383
Fix ReduceOp type hint in ColossalAI strategy (#15535) 2022-11-04 19:34:34 +00:00
thomas chaton ecc8ac07c6
[App] Introduce Multi Node Component (#15524) 2022-11-04 17:41:59 +00:00
Carlos Mocholí 0c63534b7e
remove source-lit docs 2 (#15527) 2022-11-04 18:01:04 +01:00
Adrian Wälchli 39c6ec9ce3
Only load global step when fitting (#15532)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-11-04 16:58:24 +00:00
Carlos Mocholí f392180c38
Do not modify PACKAGE_NAME on install (#15493)
* Do not modify PACKAGE_NAME on install

* Fix ci pkg action

* Required

* Typos

* Apply suggestions from code review

* Undo defaults

* Cleanup

* Implement idea

* Fuck

* Apps mock fix

* Fix app-pytest with PKG_NAME=app

* Justus suggestion

* Debug Windows

* Update setup.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Revert "Debug Windows"

This reverts commit 9fe3ba3665.

* SSH action

* Crazy bug

* Revert "SSH action"

This reverts commit 5061e8e7d6.

* Package import step

* Avoid env conflict

* Debug

* Whitespace

* Try removing existing lite build

* This should be redundant now

* Add back env now that source-lit is gone

* Remove download artifact

* checkgroup

* TODOs suggested by Jirka

* _

* Revert "_". These are local variables, do not need protected

This reverts commit 8340b85991.

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-04 17:51:03 +01:00