* Reuse connection if it matches a connection from an active terminal
* Remove unused import
* Include both name and id in the check
* Fix messages and tests
* Add test
* Handle monkeypatching more cleanly
* Remove unused imports
Co-authored-by: Luca Antiga <luca@lightning.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Do not modify PACKAGE_NAME on install
* Fix ci pkg action
* Required
* Typos
* Apply suggestions from code review
* Undo defaults
* Cleanup
* Implement idea
* Fuck
* Apps mock fix
* Fix app-pytest with PKG_NAME=app
* Justus suggestion
* Debug Windows
* Update setup.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Revert "Debug Windows"
This reverts commit 9fe3ba3665.
* SSH action
* Crazy bug
* Revert "SSH action"
This reverts commit 5061e8e7d6.
* Package import step
* Avoid env conflict
* Debug
* Whitespace
* Try removing existing lite build
* This should be redundant now
* Add back env now that source-lit is gone
* Remove download artifact
* checkgroup
* TODOs suggested by Jirka
* _
* Revert "_". These are local variables, do not need protected
This reverts commit 8340b85991.
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* implement ssh command
* add tests that ssh command is available
* most SSH command tests
* update changelog
* Update src/lightning_app/cli/lightning_cli.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* adrian feedback pt1
* Update src/lightning_app/cli/lightning_cli.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* more feedback
* rework to support app name
* update tests based on different interface
* Update src/lightning_app/cli/lightning_cli.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update src/lightning_app/cli/lightning_cli.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update src/lightning_app/cli/lightning_cli.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update src/lightning_app/cli/lightning_cli.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update tests with changed expectation
* fix tests that broke with introduction of shutils
* fix too long line
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
## What does this PR do?
Removes the ability to specify `--instance-types` when creating clusters.
Instead, all clusters will be able to use every instance type supported by the platform.
* add cli commands for adding/ removing resources
as discussed with Adrian, we want to adopt "lightning add" and "lightning remove" for ssh-keys,
as the resource already exists.
* implement ssh-key management
* one parameter for public key, optional name
* handle the case where a private key file was provided
* make ssh key-mgmt support classes protected
* re-order add ssh-key args
* change types signatures of add_key
* rename test cases
* update changelog
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
## What does this PR do?
The lightning CLI uses the lightning APIs to create, read, update, and delete resources such as apps and clusters.
If an API call fails due to client error (e.g. invalid input, unauthorized, unauthenticated) the API will return an HTTP status code in the 400s, along with a custom message in the response body.
The CLI (usually) does not directly handle these exceptions. As a result the exception bubbles up through the click framework (which does not recognize the exception type) and to the python runtime, which produces a long traceback and dumps the raw exception to the user.
This PR fixes the user experience. When our API fails on a client error, the CLI will display the message from the server without a traceback.
* placeholder
* mirror + prune
* makedir
* setup
* ci
* ci
* name
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci clean
* empty
* py
* parallel
* doctest
* flake8
* ci
* typo
* replace
* clean
* Apply suggestions from code review
* re.sub
* fix UI path
* full replace
* ui path?
* replace
* updates
* regex
* ci
* fix
* ci
* path
* ci
* replace
* Update .actions/setup_tools.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* also convert lightning_lite tests for PL tests to adapt mocking paths
* fix app example test
* update logger propagation for PL tests
* update logger propagation for PL tests
* Apply suggestions from code review
* Revert "update logger propagation for PL tests"
This reverts commit c1a5e119c7.
* playwright
* py
* update import in tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try edit import in overwrite
* debug code
* rev playwright
* Revert "try edit import in overwrite"
This reverts commit c02f766521.
* ci: adjust examples
* adjust examples cloud
* mock lightning_app
* Install assistant dependencies
* lightning
* setup
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Apply suggestions from code review
* disable cache
* move doctest to install
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* )
* echo ./
* ci
* lru
* revert disabling cache, prints
* ci
* prune ci jobs
* prune ci jobs
* training loop standalone tests
* add sys modules cleanup fixture
* make use of fixture
* revert standalone
* ci e2e
* fix imports in lightning
* fix imports of lightning in tests
* Revert "make use of fixture"
This reverts commit c15efdd205.
* Revert other commits for fixtures
* revert use of fixture
* py3.9
* fix mocking
* fix paths
* hack mocking
* docs
* Apply suggestions from code review
* rev suggestion
* Minor changes to the parametrizations
* Update checkgroup with the new and changed jobs
* include frontend dir
* cli
* fix imports and entry point
* Revert standalone
* rc1
* e2e on staging
* Revert "Revert standalone"
This reverts commit 9df96685b8.
* groups
* to
* ci: pt ver
* docker
* Apply suggestions from code review
* Copy over changes from previous commit to other groups
* Add back changes from bad merge
* Uppercase step name everywhere
* update
* ci
* ci: lai oldest
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: manskx <ahmed.mansy156@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Stops a bug when cross-launching an app between clusters.
Currently the platform does not allow running multiple app instances. If you have `app-1` running on `cluster-1` and try to run it on `cluster-2`, the CLI will succeed but the app will never start.
This PR prevents this disconnect. The app should not be uploaded / released if it won't run. An error is presented to the user explaining what happened and how to proceed (specify a different `--name`: e.g. `app-2`).
Once the platform supports multiple app instances / running individual apps on multiple clusters, this PR can be reverted.
* use more recent lightning cloud launcher
* allow LightningApp to use custom cloud compute for flows
* feedback from adrian
* adjust other cloud tests
* update
* update
* update commens
* Update src/lightning_app/core/app.py
Co-authored-by: Sherin Thomas <sherin@grid.ai>
* Close profiler when `StopIteration` is raised (#14945)
* Find last checkpoints on restart (#14907)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Remove unused gcsfs dependency (#14962)
* Update hpu mixed precision link (#14974)
Signed-off-by: Jerome <janand@habana.ai>
* Bump version of fsspec (#14975)
fsspec verbump
* Fix TPU test CI (#14926)
* Fix TPU test CI
* +x first
* Lite first to uncovert errors faster
* Fixes
* One more
* Simplify XLALauncher wrapping to avoid pickle error
* debug
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Debug commit successful. Trying local definitions
* Require tpu for mock test
* ValueError: The number of devices must be either 1 or 8, got 4 instead
* Fix mock test
* Simplify call, rely on defaults
* Skip OSError for now. Maybe upgrading will help
* Simplify launch tests, move some to lite
* Stricter typing
* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.
* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."
This reverts commit f65107ebf3.
* Alternative boring solution to the reverted commit
* Fix failing test on CUDA machine
* Workarounds
* Try latest mkl
* Revert "Try latest mkl"
This reverts commit d06813aa67.
* Wrong exception
* xfail
* Mypy
* Comment change
* Spawn launch refactor
* Accept that we cannot lazy init now
* Fix mypy and launch test failures
* The base dockerfile already includes mkl-2022.1.0 - what if we use it?
* try a different mkl version
* Revert mkl version changes
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
* Trainer: fix support for non-distributed PyTorch (#14971)
* Trainer: fix non-distributed use
* Update CHANGELOG
* fixes typing errors in rich_progress.py (#14963)
* revert default cloud compute rename
* allow LightningApp to use custom cloud compute for flows
* feedback from adrian
* update
* resolve merge with master conflict
* remove preemptible
* update CHANGELOG
* add basic flow cloud compute documentation
* fix docs build
* add missing symlink
* try to fix sphinx
* another attempt for docs
* fix new test
Signed-off-by: Jerome <janand@habana.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Sherin Thomas <sherin@grid.ai>
Co-authored-by: Ziyad Sheebaelhamd <47150407+ziyadsheeba@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jerome Anand <88475913+jerome-habana@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: DP <10988155+donlapark@users.noreply.github.com>
* renamed Mount argument
* fix tests
* Apply suggestions from code review
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
* updated examples as well
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
* added mount class and configured it into compute config
* added mount to the cloud runtime dispatcher
* raise error if s3 bucket is passed to a drive telling the user to utilize mounts
* added example for app
* udpated tests
* updated tests
* addressed code review comments
* fix bug
* bugfix
* updates'
* code review comments
* updates
* fixed tests after rename
* fix tests