Commit Graph

3161 Commits

Author SHA1 Message Date
Adrian Wälchli 7767fd36b6
Fix result transfer in multiprocessing launcher on multi-node (#15567)
* Fix result transfer in multiprocessing launcher on multi-node

* add simple test

* add comment

* update test

* changelog

* use tempfile

* fix

* assert None

* unused import

* add comment
2022-11-08 13:07:58 +01:00
Rohit Gupta 0886e6352e
Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-08 02:10:35 +00:00
Luca Antiga 01f57a9cbd
Reuse existing commands when running connect more than once (#15471)
* Reuse connection if it matches a connection from an active terminal
* Remove unused import
* Include both name and id in the check
* Fix messages and tests
* Add test
* Handle monkeypatching more cleanly
* Remove unused imports

Co-authored-by: Luca Antiga <luca@lightning.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-07 20:01:35 +00:00
moghadas76 d5ffdfac2a
Fix: Revert lightning_lite.utilities.rank_zero_only to preserve backward compatibility (#15536)
* Fix: Revert  to preserve backward compatibility

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-07 19:23:30 +00:00
Akihiro Nitta e03809f881
Fix `tests_pytorch` import error in legacy checkpoint CI (#15566)
Fix tests_pytorch import error
2022-11-07 11:09:25 +01:00
thomas chaton 820233176b
[App] Fixed Multi Node and add examples (#15557) 2022-11-07 09:36:41 +00:00
geoffrey-g-delhomme 7bdfced27c
Let metadata `score` be serializable by wand (#15544)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 14:51:49 +00:00
Carlos Mocholí 12d6e44796
Grep for potential errors in standalone tests (#15341)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 04:29:38 +01:00
Adrian Wälchli dcfaa065ab
Improve the checkpoint upgrade utility script (#15333) 2022-11-04 21:41:32 +00:00
Yuxuan Lu ee8a57da0f
Fix usage of fs.listdir in CheckpointConnector (#15413)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-11-04 20:21:52 +00:00
Adrian Wälchli 62d040c383
Fix ReduceOp type hint in ColossalAI strategy (#15535) 2022-11-04 19:34:34 +00:00
thomas chaton ecc8ac07c6
[App] Introduce Multi Node Component (#15524) 2022-11-04 17:41:59 +00:00
Adrian Wälchli 39c6ec9ce3
Only load global step when fitting (#15532)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-11-04 16:58:24 +00:00
Carlos Mocholí f392180c38
Do not modify PACKAGE_NAME on install (#15493)
* Do not modify PACKAGE_NAME on install

* Fix ci pkg action

* Required

* Typos

* Apply suggestions from code review

* Undo defaults

* Cleanup

* Implement idea

* Fuck

* Apps mock fix

* Fix app-pytest with PKG_NAME=app

* Justus suggestion

* Debug Windows

* Update setup.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Revert "Debug Windows"

This reverts commit 9fe3ba3665.

* SSH action

* Crazy bug

* Revert "SSH action"

This reverts commit 5061e8e7d6.

* Package import step

* Avoid env conflict

* Debug

* Whitespace

* Try removing existing lite build

* This should be redundant now

* Add back env now that source-lit is gone

* Remove download artifact

* checkgroup

* TODOs suggested by Jirka

* _

* Revert "_". These are local variables, do not need protected

This reverts commit 8340b85991.

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-04 17:51:03 +01:00
Luca Antiga 5cf50363c3
Fix handling of script arguments in tracer (#15518)
* Don't assume script args start with double dash

* add changelog

Co-authored-by: Luca Antiga <luca@lightning.ai>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-04 16:03:46 +01:00
thomas chaton 1824ad0636
[App] Hot Fix: Missing root flow in app.flows (#15531) 2022-11-04 15:00:56 +00:00
thomas chaton 984daa6fa2
[App] Add start method to the LightningWork (#15523) 2022-11-04 12:53:48 +01:00
thomas chaton 921dc1cd9a
[App] Resolve inconsistency where the `flow.flows` property isn't recursive leading to flow overrides (#15466)
* update

* update

* update

* update

* update

* resolve attachment

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2022-11-03 22:10:47 +00:00
Luca Antiga 16960276ff
Periodically sync database to the drive (#15441) 2022-11-03 21:07:14 +00:00
Raphael Randschau f53c0a7053
Add SSH command to CLI (#15310)
* implement ssh command

* add tests that ssh command is available

* most SSH command tests

* update changelog

* Update src/lightning_app/cli/lightning_cli.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* adrian feedback pt1

* Update src/lightning_app/cli/lightning_cli.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* more feedback

* rework to support app name

* update tests based on different interface

* Update src/lightning_app/cli/lightning_cli.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update src/lightning_app/cli/lightning_cli.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update src/lightning_app/cli/lightning_cli.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update src/lightning_app/cli/lightning_cli.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update tests with changed expectation

* fix tests that broke with introduction of shutils

* fix too long line

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-03 08:32:08 -07:00
Adrian Wälchli 9c20cad40e
Fix srun detection causing permission error on non-SLURM platforms (#15485)
* improve srun detection
* changelog
* try catch is obsolete

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-03 03:14:15 +01:00
Ethan Harris 555257a4ba
[App] Auto-upgrade / detect environment mis-match from the CLI (#15434)
* Add auto-upgrade from the CLI and check for current env

* No longer require `python -m` in docs

* Tabs -> spaces

* Ignore pre-releases

* Test + docs
2022-11-02 17:13:36 -04:00
Adrian Wälchli e52d6c5b35
Fix TensorBoardLogger's validation of example input when logging graph (#15323) 2022-11-02 21:10:15 +00:00
Adrian Wälchli 94f7d2319a
Introduce checkpoint migration (#15237)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-02 15:14:04 +00:00
Adrian Wälchli 6aa6423d65
Launch options for Lightning Lite (#14992)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-02 14:56:22 +00:00
github-actions[bot] 6d9efbe4f0
Adding test for legacy checkpiont created with 1.8.0 (#15450)
* [create-pull-request] automated change

* Add legacy to checkgroup

Co-authored-by: akihironitta <akihironitta@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-11-01 16:47:17 +00:00
Sitcebelly 94bed87a34
Implement freeze batchnorm with freezing track running stats (#15063)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-01 16:11:42 +00:00
Rohit Gupta 61ae35c378
Use sklearn in runif (#15426)
* Use sklearn in runif
* test by removing sklearn dep
* remove repeated code
* seed
2022-11-01 11:40:32 +00:00
Mansy 19df40d899
Allow force run app on cloud if loading locally errors (#15019)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-10-31 18:22:54 +00:00
Luca Furst ec156ad6e3
[App] Remove --instance-types from cluster creation (#15314)
## What does this PR do?

Removes the ability to specify `--instance-types` when creating clusters.

Instead, all clusters will be able to use every instance type supported by the platform.
2022-10-31 18:19:26 +00:00
Jirka Borovec 3e5e5079e7
set default work version v1.12 (#15431) 2022-10-31 18:45:18 +01:00
Wouter Zwerink c287b5d668
neptune.init deprecation fix (#15393)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-10-31 11:10:44 -04:00
Adrian Wälchli 0f957b5a86
Fix DataLoader re-instantiation when attribute is array (#15409) 2022-10-31 16:09:29 +01:00
Rohit Gupta 773cb3e8c8
Fix skipped tests due to sklearn (#15311)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-10-31 13:58:34 +05:30
Dmitry Frolov 11196b1707
Fix default CloudCompute for flows (#15371)
* Fix default CloudCompute for flows
* Unit test added
2022-10-29 08:36:59 +01:00
Adrian Wälchli 6b0d41cb8a
Fix issues when RichProgressBar disabled (#15376) 2022-10-29 00:52:35 +00:00
Ethan Harris b340859df9
[App] Add `ServeStreamlit` work (#15400)
* Init

* Updates

* Add test for model building

* Imports

* Fix

* Typing

* Ignore serve streamlit in mypy
2022-10-28 16:33:53 -04:00
Carlos Mocholí 2fd1af0449
Deprecate `AllGatherGrad` (#15364) 2022-10-28 19:51:27 +00:00
Raphael Randschau adc8238597
Add SSH key management to CLI (#15291)
* add cli commands for adding/ removing resources

as discussed with Adrian, we want to adopt "lightning add" and "lightning remove" for  ssh-keys,
as the resource already exists.

* implement ssh-key management
* one parameter for public key, optional name
* handle the case where a private key file was provided
* make ssh key-mgmt support classes protected
* re-order add ssh-key args
* change types signatures of add_key
* rename test cases
* update changelog

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-10-28 20:42:26 +02:00
Raphael Randschau 273269ef1b
allow e2e test image to be changed via env variable (#15200)
as we patch our base images, this e2e image needs to be updated all the time as well.
Instead of changing this with a PR all the time this PR makes the e2e container image version configurable through the ENV.


Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-10-28 15:40:59 +00:00
thomas chaton 2e72a4c801
Add support for functions (#15098) 2022-10-28 15:06:45 +00:00
Ethan Harris bbf7848a5f
[App] Fix cluster logic (#15383) 2022-10-28 15:35:21 +01:00
Ethan Harris e9a6b83437
[App] Reduce import depths and add test (#15330)
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-10-28 13:57:35 +00:00
thomas chaton b6ebc7b5f5
[App] Add env variables to desactivate pull and push of the App State (#15367) 2022-10-28 13:26:08 +00:00
Adrian Wälchli 5eafa52596
Fix resetting internal bars in RichProgressBar after each trainer stage (#15377) 2022-10-28 06:20:45 -04:00
thomas chaton df4b705768
Add JustPy Frontend (#15002)
* update

* update

* update

* update

* changelog

* update

* update

* update

* update

* update

* update

* update

* update

* uipdate

* update

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-27 11:48:26 -04:00
Luca Furst c1a3cb5381
[App] Format client error ApiExceptions without a traceback (#15130)
## What does this PR do?

The lightning CLI uses the lightning APIs to create, read, update, and delete resources such as apps and clusters.

If an API call fails due to client error (e.g. invalid input, unauthorized, unauthenticated) the API will return an HTTP status code in the 400s, along with a custom message in the response body.

The CLI (usually) does not directly handle these exceptions. As a result the exception bubbles up through the click framework (which does not recognize the exception type) and to the python runtime, which produces a long traceback and dumps the raw exception to the user.

This PR fixes the user experience. When our API fails on a client error, the CLI will display the message from the server without a traceback.
2022-10-27 09:44:22 -04:00
Jirka Borovec 95ae393ca8
LAI: creating mirror package (#15105)
* placeholder

* mirror + prune

* makedir

* setup

* ci

* ci

* name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci clean

* empty

* py

* parallel

* doctest

* flake8

* ci

* typo

* replace

* clean

* Apply suggestions from code review

* re.sub

* fix UI path

* full replace

* ui path?

* replace

* updates

* regex

* ci

* fix

* ci

* path

* ci

* replace

* Update .actions/setup_tools.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* also convert lightning_lite tests for PL tests to adapt mocking paths

* fix app example test

* update logger propagation for PL tests

* update logger propagation for PL tests

* Apply suggestions from code review

* Revert "update logger propagation for PL tests"

This reverts commit c1a5e119c7.

* playwright

* py

* update import in tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try edit import in overwrite

* debug code

* rev playwright

* Revert "try edit import in overwrite"

This reverts commit c02f766521.

* ci: adjust examples

* adjust examples cloud

* mock lightning_app

* Install assistant dependencies

* lightning

* setup

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

* disable cache

* move doctest to install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* )

* echo ./

* ci

* lru

* revert disabling cache, prints

* ci

* prune ci jobs

* prune ci jobs

* training loop standalone tests

* add sys modules cleanup fixture

* make use of fixture

* revert standalone

* ci e2e

* fix imports in lightning

* fix imports of lightning in tests

* Revert "make use of fixture"

This reverts commit c15efdd205.

* Revert other commits for fixtures

* revert use of fixture

* py3.9

* fix mocking

* fix paths

* hack mocking

* docs

* Apply suggestions from code review

* rev suggestion

* Minor changes to the parametrizations

* Update checkgroup with the new and changed jobs

* include frontend dir

* cli

* fix imports and entry point

* Revert standalone

* rc1

* e2e on staging

* Revert "Revert standalone"

This reverts commit 9df96685b8.

* groups

* to

* ci: pt ver

* docker

* Apply suggestions from code review

* Copy over changes from previous commit to other groups

* Add back changes from bad merge

* Uppercase step name everywhere

* update

* ci

* ci: lai oldest

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: manskx <ahmed.mansy156@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2022-10-27 12:32:49 +02:00
thomas chaton 5da74847e2
[App] Show logs command to be standalone and re-usable (#15343) 2022-10-27 09:05:48 +00:00
Carlos Mocholí 02074f16c7
Fix PyTorch versions in Lite CI (#15338)
* replace oldest in lite

* Fix PyTorch versions in Lite CI

* This will be moved to install pkg workflow in the mirror PR

* 1.13 fixes

* Windows fix

* sorting

Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-26 15:09:08 -04:00