Commit Graph

8176 Commits

Author SHA1 Message Date
thomas chaton b6ebc7b5f5
[App] Add env variables to desactivate pull and push of the App State (#15367) 2022-10-28 13:26:08 +00:00
PaulLerner 6b0b6b8903
Update auto_scale_batch_size error message and docstring with LightningDataModule (#15351) 2022-10-28 12:58:50 +00:00
Justus Schock 1c33d57b0a
Fix unpickle redirection (#15382) 2022-10-28 12:35:35 +00:00
Neven Miculinic 478ca8c3a0
ENG-1404: Hide region CLI flag for cluster creation (#15277)
Hide region CLI flag
2022-10-28 12:03:59 +00:00
Carlos Mocholí 6b8f394001
App tests hang on Windows with Python 3.9 (#15385)
App tests hang on Windows with Python 3.8
2022-10-28 13:01:55 +02:00
Adrian Wälchli 5eafa52596
Fix resetting internal bars in RichProgressBar after each trainer stage (#15377) 2022-10-28 06:20:45 -04:00
Carlos Mocholí 53ee014bf1
Create required group for app examples (#15332) 2022-10-28 11:20:30 +02:00
Jirka Borovec d01970f5cd
CI: switch cloud e2e tests to Prod (#15369) 2022-10-27 17:54:36 +00:00
Jirka Borovec 9b35079c36
docker: drop pt 1.9 (#15345)
* docker: drop pt 1.9

* Missed some

* Last one

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2022-10-27 18:17:36 +02:00
thomas chaton df4b705768
Add JustPy Frontend (#15002)
* update

* update

* update

* update

* changelog

* update

* update

* update

* update

* update

* update

* update

* update

* uipdate

* update

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-27 11:48:26 -04:00
Luca Furst c1a3cb5381
[App] Format client error ApiExceptions without a traceback (#15130)
## What does this PR do?

The lightning CLI uses the lightning APIs to create, read, update, and delete resources such as apps and clusters.

If an API call fails due to client error (e.g. invalid input, unauthorized, unauthenticated) the API will return an HTTP status code in the 400s, along with a custom message in the response body.

The CLI (usually) does not directly handle these exceptions. As a result the exception bubbles up through the click framework (which does not recognize the exception type) and to the python runtime, which produces a long traceback and dumps the raw exception to the user.

This PR fixes the user experience. When our API fails on a client error, the CLI will display the message from the server without a traceback.
2022-10-27 09:44:22 -04:00
Adrian Wälchli 9b694df351
Fix import for OrderedDict in Python 3.7.0 (#15359)
* weird
2022-10-27 15:39:19 +02:00
Jirka Borovec 863dfb24a7
fixing publish pypi (#15361) 2022-10-27 15:16:09 +02:00
kimpty d956a123bd
Update train_model_basic.rst (#15352) 2022-10-27 09:13:11 -04:00
Jirka Borovec 889fc50e8a
Fix missing secrets in legacy checkpoint workflow (#15358) 2022-10-27 13:45:36 +02:00
Jirka Borovec 95ae393ca8
LAI: creating mirror package (#15105)
* placeholder

* mirror + prune

* makedir

* setup

* ci

* ci

* name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci clean

* empty

* py

* parallel

* doctest

* flake8

* ci

* typo

* replace

* clean

* Apply suggestions from code review

* re.sub

* fix UI path

* full replace

* ui path?

* replace

* updates

* regex

* ci

* fix

* ci

* path

* ci

* replace

* Update .actions/setup_tools.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* also convert lightning_lite tests for PL tests to adapt mocking paths

* fix app example test

* update logger propagation for PL tests

* update logger propagation for PL tests

* Apply suggestions from code review

* Revert "update logger propagation for PL tests"

This reverts commit c1a5e119c7.

* playwright

* py

* update import in tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try edit import in overwrite

* debug code

* rev playwright

* Revert "try edit import in overwrite"

This reverts commit c02f766521.

* ci: adjust examples

* adjust examples cloud

* mock lightning_app

* Install assistant dependencies

* lightning

* setup

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

* disable cache

* move doctest to install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* )

* echo ./

* ci

* lru

* revert disabling cache, prints

* ci

* prune ci jobs

* prune ci jobs

* training loop standalone tests

* add sys modules cleanup fixture

* make use of fixture

* revert standalone

* ci e2e

* fix imports in lightning

* fix imports of lightning in tests

* Revert "make use of fixture"

This reverts commit c15efdd205.

* Revert other commits for fixtures

* revert use of fixture

* py3.9

* fix mocking

* fix paths

* hack mocking

* docs

* Apply suggestions from code review

* rev suggestion

* Minor changes to the parametrizations

* Update checkgroup with the new and changed jobs

* include frontend dir

* cli

* fix imports and entry point

* Revert standalone

* rc1

* e2e on staging

* Revert "Revert standalone"

This reverts commit 9df96685b8.

* groups

* to

* ci: pt ver

* docker

* Apply suggestions from code review

* Copy over changes from previous commit to other groups

* Add back changes from bad merge

* Uppercase step name everywhere

* update

* ci

* ci: lai oldest

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: manskx <ahmed.mansy156@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2022-10-27 12:32:49 +02:00
Adrian Wälchli 6426d43235
Fix exception when creating a multiprocessing Pool after importing Lightning (#15292) 2022-10-27 10:16:55 +00:00
Carlos Mocholí 105f26873c
Do not trigger PyTorch GPU tests on Lite test changes (#15348) 2022-10-27 09:57:16 +00:00
thomas chaton 5da74847e2
[App] Show logs command to be standalone and re-usable (#15343) 2022-10-27 09:05:48 +00:00
dependabot[bot] 27a1f5c811
Update pandas requirement from <1.5.1,>1.0 to >1.0,<1.5.2 in /requirements (#15263)
Update pandas requirement in /requirements

Updates the requirements on [pandas](https://github.com/pandas-dev/pandas) to permit the latest version.
- [Release notes](https://github.com/pandas-dev/pandas/releases)
- [Changelog](https://github.com/pandas-dev/pandas/blob/main/RELEASE.md)
- [Commits](https://github.com/pandas-dev/pandas/compare/v1.0.1...v1.5.1)

---
updated-dependencies:
- dependency-name: pandas
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-26 21:28:27 +00:00
Carlos Mocholí 02074f16c7
Fix PyTorch versions in Lite CI (#15338)
* replace oldest in lite

* Fix PyTorch versions in Lite CI

* This will be moved to install pkg workflow in the mirror PR

* 1.13 fixes

* Windows fix

* sorting

Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-26 15:09:08 -04:00
otaj 537da82059
Replace oldest versions also for Lite (#15337) 2022-10-26 18:22:46 +00:00
thomas chaton 2bfaf3d558
[App] Resolve Research Studio Bugs (#15313) 2022-10-26 16:36:58 +00:00
Justus Schock 6ee1f6c4b7
New skip conditions for unpickle-patching tests (#15329)
* New running conditions for tests
* found one more mistake
2022-10-26 18:33:22 +02:00
Adrian Wälchli ac89d70d4a
Fix pickling issues with rich progress bar (#15319) 2022-10-26 15:25:11 +00:00
Adrian Wälchli 38a9e69543
Extend the detection of interactive mode (#15293)
* extend interactive mode detection
* update test names
* changelog
* test
2022-10-26 15:24:11 +00:00
dependabot[bot] 3cccaec60b
Pin test requirements to their current latest versions (#15157)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-26 14:31:25 +00:00
Carlos Mocholí 54e76683fa
Drop "full" suffix in CI (#15320) 2022-10-26 16:24:16 +02:00
thomas chaton d1263f5d74
Resolve Boring App Race Condition (#15324) 2022-10-26 12:56:03 +00:00
Adrian Wälchli 0f9156374d
Mark internal Lite APIs as protected (#15307)
* mark internal lite apis as protected
* formatting
* docs update

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-10-26 12:51:50 +00:00
Akihiro Nitta bf12b673ec
Enable and fix recent legacy checkpoint tests in CI (#12491) 2022-10-26 11:24:53 +00:00
Carlos Mocholí 102685c3ab
Remove non-existent Lite checks (#15325) 2022-10-26 13:19:20 +02:00
Justus Schock 629912298a
add cloudio pickle patching for unified package (#15309) 2022-10-26 10:44:55 +00:00
Carlos Mocholí 5202b86287
Narrower CI timeouts (#15231)
Narrow CI timeouts
2022-10-26 10:07:09 +00:00
Carlos Mocholí e0f6db7ec3
Fix typo in OS name (#15318) 2022-10-26 10:41:41 +02:00
Luca Furst 488c2ac69c
Prevent bug when launching apps on multiple clusters (#15226)
Stops a bug when cross-launching an app between clusters.

Currently the platform does not allow running multiple app instances. If you have `app-1` running on `cluster-1` and try to run it on `cluster-2`, the CLI will succeed but the app will never start.

This PR prevents this disconnect. The app should not be uploaded / released if it won't run. An error is presented to the user explaining what happened and how to proceed (specify a different `--name`: e.g. `app-2`).

Once the platform supports multiple app instances / running individual apps on multiple clusters, this PR can be reverted.
2022-10-25 23:36:57 +00:00
Carlos Mocholí 8b4d71c93f
Update pl CPU testing matrix (#15312)
* Update pl CPU testing matrix

* Remove standalone comment, could be confused

* Heurisitc for PyTorch latest

* partition rest

* ckpgoup

* These do not exist

* This ALSO does not exist
2022-10-25 15:18:16 -04:00
Jirka Borovec feda39bd18
drop GH e2e cloud & add cron for Azure (#15306)
* Fix GH e2e cloud
* drop gha
* cron
2022-10-25 15:18:01 -04:00
Raphael Randschau 13baad56e4
Add support for custom cloud compute configurations for Flows (#14831)
* use more recent lightning cloud launcher

* allow LightningApp to use custom cloud compute for flows

* feedback from adrian

* adjust other cloud tests

* update

* update

* update commens

* Update src/lightning_app/core/app.py

Co-authored-by: Sherin Thomas <sherin@grid.ai>

* Close profiler when `StopIteration` is raised (#14945)

* Find last checkpoints on restart (#14907)


Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Remove unused gcsfs dependency (#14962)

* Update hpu mixed precision link (#14974)

Signed-off-by: Jerome <janand@habana.ai>

* Bump version of fsspec (#14975)

fsspec verbump

* Fix TPU test CI (#14926)

* Fix TPU test CI

* +x first

* Lite first to uncovert errors faster

* Fixes

* One more

* Simplify XLALauncher wrapping to avoid pickle error

* debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug commit successful. Trying local definitions

* Require tpu for mock test

* ValueError: The number of devices must be either 1 or 8, got 4 instead

* Fix mock test

* Simplify call, rely on defaults

* Skip OSError for now. Maybe upgrading will help

* Simplify launch tests, move some to lite

* Stricter typing

* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.

* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."

This reverts commit f65107ebf3.

* Alternative boring solution to the reverted commit

* Fix failing test on CUDA machine

* Workarounds

* Try latest mkl

* Revert "Try latest mkl"

This reverts commit d06813aa67.

* Wrong exception

* xfail

* Mypy

* Comment change

* Spawn launch refactor

* Accept that we cannot lazy init now

* Fix mypy and launch test failures

* The base dockerfile already includes mkl-2022.1.0 - what if we use it?

* try a different mkl version

* Revert mkl version changes

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Trainer: fix support for non-distributed PyTorch (#14971)

* Trainer: fix non-distributed use
* Update CHANGELOG

* fixes typing errors in rich_progress.py (#14963)

* revert default cloud compute rename

* allow LightningApp to use custom cloud compute for flows

* feedback from adrian

* update

* resolve merge with master conflict

* remove preemptible

* update CHANGELOG

* add basic flow cloud compute documentation

* fix docs build

* add missing symlink

* try to fix sphinx

* another attempt for docs

* fix new test

Signed-off-by: Jerome <janand@habana.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Sherin Thomas <sherin@grid.ai>
Co-authored-by: Ziyad Sheebaelhamd <47150407+ziyadsheeba@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jerome Anand <88475913+jerome-habana@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: DP <10988155+donlapark@users.noreply.github.com>
2022-10-25 11:29:15 -07:00
Sherin Thomas 53d2c0684e
Pick queue type only if specified (#15295)
Pick queue type only if specified (#15295)
2022-10-25 22:17:56 +05:30
Jirka Borovec 9c2164a1ad
Run all tests in master (#15288)
* example full tests on master

* Modify checkgroup

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-25 12:34:04 -04:00
thomas chaton 48993cb5ed
Resolve App e2es (#15302)
* update

* prune

* examples

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-10-25 09:54:38 -04:00
thomas chaton bd658441bd
[App] testing `lightning` in `lightning-app` package (#15286)
* hot fix

* update

* update

* update

* cmd

* pkg

* update

* Apply suggestions from code review

* update

* update

* Apply suggestions from code review

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* cleaning

* Apply suggestions from code review

* update

* update

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-25 09:24:20 -04:00
Carlos Mocholí 777a12aa69
Allow sharing secrets on TPU tests (#15289) 2022-10-25 09:23:39 -04:00
Carlos Mocholí 7b3de1215f
Remove examples and loggers from develop dependencies (#15282)
* Remove examples and loggers from develop dependencies

* remove more references

* Fix mypy

* Keep logger file for docs mocking

* Simpler fix

* Fix docs build

* Global testsetup

* Matching files

* Undo change

* loggers as info

* Clarify

* Update requirements/pytorch/loggers.info

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-10-25 09:23:26 -04:00
otaj 76e462a0be
Do not lose references of trainer in test (#15272)
* Fix reference error

* Skip flaky hanging test

* .

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-25 09:23:15 -04:00
Jirka Borovec 9a725ce71c
fix requirements for package (#15285)
* fix requirements

* https

* verbose

Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2022-10-25 12:52:42 +02:00
Sherin Thomas dec2373391
Better handling connection interruption (#15267)
* config fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-10-25 12:58:52 +05:30
dependabot[bot] 47a2a62aed
Use `google-github-actions/get-gke-credentials@v0` (#15264)
* Bump google-github-actions/get-gke-credentials from 0.2.1 to 0.8.2

Bumps [google-github-actions/get-gke-credentials](https://github.com/google-github-actions/get-gke-credentials) from 0.2.1 to 0.8.2.
- [Release notes](https://github.com/google-github-actions/get-gke-credentials/releases)
- [Changelog](https://github.com/google-github-actions/get-gke-credentials/blob/main/CHANGELOG.md)
- [Commits](fb08709ba2...45e9605d68)

---
updated-dependencies:
- dependency-name: google-github-actions/get-gke-credentials
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update .github/workflows/tpu-tests.yml

* Don't use deprecated credentials input

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-24 21:42:08 +00:00
Adrian Wälchli 9e71132124
Include link to bug report template in GitHub bug issue (#15270)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-24 21:35:43 +00:00