Commit Graph

2823 Commits

Author SHA1 Message Date
Anton Shevtsov c55fe7105b
Prefix seed_everything log messages with rank info (#14031)
Co-authored-by: Anton Shevtsov <aeshevtsov@avito.ru>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 15:40:30 +02:00
Adrian Wälchli 0cfc53d6b4
Fix regression on default value for `find_unused_parameters` (#14095) 2022-08-09 13:56:02 +05:30
thomas chaton 55ae812dbf
Resolve increased time. (#14074) 2022-08-08 15:48:50 +02:00
Carlos Mocholí d072e4451a
Fix dtype inference during gradient norm computation (#14051) 2022-08-08 11:35:06 +00:00
Rick Izzo b4ade232c8
Fix: Start Lightning App on Cloud if Repo Begins With Name "Lightning" (#14025) 2022-08-08 11:13:25 +00:00
Carlos Mocholí aaeff90254
Remove deprecated `DistributedType` and `DeviceType` enum classes (#14045) 2022-08-08 10:07:54 +02:00
Rohit Gupta b25275ccc2
Cast to fp16 before moving to device with deepspeed (#14000) 2022-08-05 22:15:15 +00:00
Raphael Randschau 26d69ceada
[CLI] add support for listing apps (#13987)
* add support for listing apps

* update changelog with correct PR number

* add tests for pagination

* fix wrong mock on test_cli

* ensure all enum values are accounted for

* make AppManager and AppList protected, add limit to pagination calls

* add restarting transition /w tests

* add state transition not yet run with tests
2022-08-05 13:42:00 -07:00
Carlos Mocholí 91dd6a68fb
Remove meta device utilities in favor of torchdistx (#13868) 2022-08-05 12:20:27 +00:00
Adrian Wälchli 3d5c3d24f9
Remove unused auto_collect_arguments class method (#14015) 2022-08-05 08:49:00 +00:00
Rohit Gupta a4e4cab7a6
Deprecate `amp_level` from `Trainer` (#13898)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-05 08:31:19 +00:00
Adam Bobowski 12a061f2aa
[App] Increased DeepDiff's verbose level to properly handle dict changes (#13960) 2022-08-05 07:57:00 +00:00
Carlos Mocholí b88b700745
Remove the deprecated DDP2 strategy (#14026) 2022-08-04 20:27:35 +00:00
Rohit Gupta f5bd6e6f5f
Cast only floating types with IPUs (#13983) 2022-08-04 19:46:07 +00:00
Raphael Randschau 341c63c2b9
[CLI] add support to run app on a specific cluster (#13894)
Add `--cluster-id` flag which can be passed to `lightning run app` if the `--cloud` flag is present.
This allows you to run your Lightning AI apps on Lightning AI BYOC clusters running on your own cloud provider infrastructure.

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
2022-08-04 10:48:29 -07:00
Adrian Wälchli ef0623ec64
Remove deprecated training type plugins (#14011)
* Remove deprecated training type plugins

* update changelog

* DDP2Plugin

* Update src/pytorch_lightning/CHANGELOG.md
2022-08-04 18:00:00 +02:00
Mansy b8739a0167
Deprecate sheety API (#14004)
* deprecate sheety

Co-authored-by: manskx <ahmed.mansy156@gmail.com>
2022-08-04 16:25:41 +02:00
Rohit Gupta e78bf2044b
Raise an error if batch transfer hooks are overridden with IPUAccelerator (#13961)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-04 12:04:42 +00:00
Adam J. Stewart d748dae548
Fix erroneous warning for unset `max_epochs` (#13262)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-03 19:17:21 +00:00
Adrian Wälchli e6a8283e9c
Organize accelerator tests (#13986) 2022-08-03 13:49:55 +00:00
thomas chaton 5479c60b22
Reduce state size (#13970) 2022-08-03 13:47:16 +00:00
Adrian Wälchli 4ce97f37a2
Validate the model input of trainer methods (#13892)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-03 13:38:42 +00:00
Adrian Wälchli ce025bf954
Lazy import check for hydra dependency (#13812) 2022-08-03 04:27:16 -04:00
Raphael Randschau 2919dcf7ee
[CLI] add support for cluster management (#13835) 2022-08-02 10:31:09 +02:00
Jerome Anand b3203d93d0
Added support for HPU device stats monitor (#13819)
* Added support for HPU device stats monitor

Signed-off-by: Jerome <janand@habana.ai>

* Update changelog

Signed-off-by: Jerome <janand@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update reference

Signed-off-by: Jerome <janand@habana.ai>

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix alignment

* add descriptions

* Update hpu_intermediate.rst

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-02 13:31:31 +05:30
Adrian Wälchli eb233ea12d
Snapshot selected globals and restore them in spawned process (#13921)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-01 22:21:46 +00:00
Rohit Gupta 0f6caffa57
Fix deepspeed default precision plugin `amp_level` to O2 (#13897)
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-29 20:36:51 +00:00
thomas chaton aefb9ab43f
(app) Introduce LightningTrainingComponent (#13830) 2022-07-29 16:44:52 +02:00
Adrian Wälchli caaf35689c
Improvements to standalone scripts (#13840)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-28 23:33:22 +00:00
Adrian Wälchli 7708ce22b2
Update GitHub links to PL repo (#13849)
* update lightning links in docs

* update links in chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update src/pytorch_lightning/README.md

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update src/pytorch_lightning/README.md

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update

* painful

* badges

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update badges

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-28 22:08:07 +02:00
HMellor 07b39c257b
Cast on host instead of IPU when using `precision=16` (#13880)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-28 19:26:41 +00:00
Adrian Wälchli 25203d4c81
Organize model summary utilities (#13893) 2022-07-28 19:23:29 +02:00
Carlos Mocholí 406cea7146
Support DeepSpeed <0.7.0 (#13859)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-07-28 14:38:51 +00:00
Carlos Mocholí 1299e4f984
Run GPU tests with PyTorch 1.12 (#13716)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-28 19:37:57 +05:30
Carlos Mocholí 511875e567
Support DeepSpeed >=0.6.0, <0.6.5 (#13863)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 18:57:52 +02:00
Adrian Wälchli fff62f0ae5
Fix TPU testing and collect all tests (#11098)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-27 15:40:40 +00:00
otaj 95f5f170f5
Allowed custom `BatchSampler`s when instantiated in `*_dataloader` hook (#13640)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 15:32:50 +00:00
Adrian Wälchli 2a24b906ac
Add batch size script argument for standalone tests (#13841)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-07-27 12:36:22 +00:00
nitinramvelraj b37e466f28
Change tests/README.md to reflect repo structure change (#13437)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-27 10:37:29 +00:00
otaj 4c7b9f0b11
Disallow batch sampler with multiple IPU devices (#13854)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-27 15:20:43 +05:30
Anton Shevtsov 41f45b475e
Check if the scheduler already has `reduce_on_plateau` (#13838)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-27 09:10:57 +00:00
Adrian Wälchli c3911700d1
Fix error handling in learning rate finder (#13845)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-27 04:32:39 -04:00
Rohit Gupta faf7ff57c0
Add support for async checkpointing (#13658) 2022-07-26 21:13:19 +05:30
thomas chaton 4c35867b61
[App] Introduce Commands (#13602) 2022-07-25 17:13:46 +00:00
Adrian Wälchli a8d7b4476c
Fix PyTorch spelling errors (#13774)
* Fix PyTorch spelling errors

* more
2022-07-25 12:51:16 -04:00
Justus Schock 227871982d
Merge different gpu backends with accelerator='gpu' (#13642)
* Rename GPUAccelerator to CUDAAccelerator

* Add back GPUAccelerator and deprecate it

* Remove temporary registration

* accelerator connector reroute

* accelerator_connector tests

* update enums

* lite support + tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move "gpu" support up before actual accelerator flag checks

* Stupid arguments

* fix tests

* change exception type

* fix registry test

* pre-commit

* CI: debug HPU flow (#13419)

* Update the hpu-tests.yml to pull docker from vault
* fire & sudo
* habana-gaudi-hpus
* Check the driver status on gaudi server (#13718)

Co-authored-by: arao <arao@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>

* Update typing-extensions requirement from <4.2.1,>=4.0.0 to >=4.0.0,<4.3.1 in /requirements (#13529)

Update typing-extensions requirement in /requirements

Updates the requirements on [typing-extensions](https://github.com/python/typing_extensions) to permit the latest version.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.0.0...4.3.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit suggestions (#13540)

updates:
- [github.com/psf/black: 22.3.0 → 22.6.0](https://github.com/psf/black/compare/22.3.0...22.6.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [FIX] Native FSDP precision + tests (#12985)

* Simplify fetching's loader types (#13111)

* Include app templates to the lightning and app packages (#13731)

* Include app templates to the package

Co-authored-by: mansy <mansy@lightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix mypy typing errors in pytorch_lightning/callbacks/model_checkpoint.py (#13617)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fix typos initialize in docs (#13557)


Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix main progress bar counter when `val_check_interval=int` and `check_val_every_n_epoch=None` (#12832)

* Fix mypy errors attributed to `pytorch_lightning.loggers.tensorboard.py` (#13688)

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fix mypy errors attributed to `pytorch_lightning.loggers.mlflow` (#13691)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>

* fix mypy errors for loggers/wandb.py (#13483)


Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Fix gatekeeper minimum check (#13769)

* changelog

* changelog

* fix order

* move up again

* add missing test

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: arao <arao@habana.ai>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Co-authored-by: mansy <mansy@lightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Lee Jungwon <33821003+BongYang@users.noreply.github.com>
Co-authored-by: Nathaniel D'Amours <88633026+NathanielDamours@users.noreply.github.com>
Co-authored-by: Justin Goheen <26209687+JustinGoheen@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Gautier Dagan <s2234411@ed.ac.uk>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-25 14:46:45 +00:00
Mauricio Villegas 1b31039c58
Update LightningCLI test for new support in latest release of jsonargparse (#13805) 2022-07-25 09:25:42 +00:00
Adrian Wälchli 81f149e9d4
Rename spawn-based launchers (#13743) 2022-07-23 11:48:15 -04:00
Adrian Wälchli fa886f2a58
Lazy import check for neptune dependency (#13477)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-23 14:06:26 +00:00
Adrian Wälchli d24978baa3
Add ddp_notebook alias for ddp_fork (#13744)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-23 09:06:35 -04:00