Commit Graph

312 Commits

Author SHA1 Message Date
ritsuki1227 6855f653bb
Set `MLFlowLogger` status to FAILED when training raises an error (#12292)
Co-authored-by: Ritsuki Yamada <ritsuki.yamada@uzabase.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-20 07:43:32 -04:00
awaelchli c0ff7a1b77 Add backward-compatibility for LightningLite in PL (#14735) 2022-09-20 13:31:56 +02:00
awaelchli e3e71670e6 Move src/pytorch_lightning/lite to src/lightning_lite (#14735) 2022-09-20 13:31:56 +02:00
Carlos Mocholí 810643bca2
Surface Neptune installation problems to the user (#14715) 2022-09-20 10:19:51 +00:00
Mauricio Villegas 3064c28ce1
Added args parameter to LightningCLI to ease running from within Python (#14596)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-19 17:38:30 +00:00
Carlos Mocholí e9c571d39f
Move accelerator-specific parsing functions with their accelerators (#14753)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-18 22:48:45 +00:00
Adrian Wälchli 4f9c7793e7
Fix TensorBoardLogger creating redundant experiment when finalizing (#14762)
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
2022-09-18 16:27:15 -04:00
Adrian Wälchli 35c65b0287
Fix test suite when running on MPS-enabled hardware (#14708) 2022-09-16 19:21:36 +00:00
Adrian Wälchli 47f0d336f1
Standalone Lite: Update LightningLite (#14726) 2022-09-16 17:25:27 +00:00
Carlos Mocholí 8c01c89d74
Remove deprecated `NeptuneLogger` code (#14727) 2022-09-16 16:26:15 +00:00
Adrian Wälchli 5bef75648e
Remove deprecated `torch_distributed_backend` logic (#14693)
* Remove deprecated torch_distributed_backend logic
* changelog
* mention deprecated
* imports

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-16 17:27:36 +02:00
Adrian Wälchli 619e76f22d
Remove silent behavior when `num_slurm_tasks` does not correspond to number of processes in Trainer (#14300)
* simplify logic
* remove hpc
* update
* add changelog
* more tests
* update test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-16 11:00:09 +00:00
Carlos Mocholí 5ff78f0753
Use the setter in the children recursively (#14724) 2022-09-15 13:58:12 +00:00
Adrian Wälchli 8b3d6d8feb
Add easy access to `state_dict` in Lite module wrapper (#14629)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-14 19:29:23 -04:00
Manan Goel 48e783dd0d
Added support for downloading wandb artifacts in the WandbLogger (#14551)
* Added functions to the WandbLogger to download and use artifacts without having to access the experiment object
* Updated CHANGLELOG.md
* Added suggested changes
* Delete test_script

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-14 14:11:52 +00:00
Adrian Wälchli 6333caabb0
Standalone Lite: Strategy base classes and registry (#14662)
* add accelerator implementations to lite

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix imports

* rename registry argument

* fix test

* fix tests

* remove duplicated test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* deprecation

* deprecations

* flake8

* fixes

* add mps to runif

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove more

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local import

* undo device stats :(

* fix import

* stupid typehints

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more refactors :(

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* rename init_device to setup_device

* remove unused import

* make uppercase to differentiate from class

* trick test after moving import locally

* add base classes and registry

* reg

* registry

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests

* update to other branches

* resolve todo(lite)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add very basic unit tests

* fix name assignment

* Update src/lightning_lite/strategies/parallel.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* remove deprecated property

* remove pre- and post backward for now

* protecting the registry utility function

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-14 09:15:21 -04:00
otaj 616304831a
Remove deprecated `BaseProfiler` and `AbstractProfiler` (#14404)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-09-13 14:52:09 +00:00
Adrian Wälchli 19a1274093
Better error message when dataloader and datamodule is None (V2) (#14637) 2022-09-13 12:26:03 +00:00
Adrian Wälchli 1ee3d1eb72
Avoid warning when cloning tensor in self.log (#14599)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-13 16:23:46 +05:30
Adrian Wälchli 4bd135a6f6
Remove deprecated `LoggerCollection` (#14283)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 21:46:46 +00:00
Max Ehrlich e5998e6bf2
Make the SLURM Preemption/Timeout Signal Configurable (#14626)
* Add parameter to change the preemption signal
* Make the signal connector use the custom signal from SLURMEnvironment

Signed-off-by: Max Ehrlich <max.ehr@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-12 19:24:35 +00:00
Adrian Wälchli 925edbca07
Remove the deprecated `weights_save_path` Trainer argument (#14424)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-12 19:02:38 +00:00
Mauricio Villegas 1680a76819
Removed from_argparse_args tests in test_cli.py (#14597) 2022-09-12 18:25:29 +00:00
Adrian Wälchli d013bcc5bf
Standalone Lite: Accelerators (#14578)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 16:00:14 +00:00
Carlos Mocholí cf3428784f
Set `running_torchscript` recursively (#14657)
* Set `running_torchscript` recursively

* CHANGELOG
2022-09-12 14:39:40 +00:00
Carlos Mocholí e859546b96
Integrate lightning_utilities `is_overridden` (#14620) 2022-09-12 15:16:57 +02:00
awaelchli cbbd148089 Add back-compatibility for checkpoint io plugins in pl/plugins/io (#14519) 2022-09-12 08:28:46 -04:00
awaelchli 463439e624 Move checkpoint io plugins from pl/plugins/io to lite/plugins/io (#14519) 2022-09-12 08:28:46 -04:00
Adrian Wälchli 024e7b8204
Standalone Lite: Cluster Environments (#14509) 2022-09-12 12:20:08 +02:00
Vasilis Vryniotis 7e9e441843
Use TorchVision's Multi-weight Support and Model Registration API on Lightning (#14567)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-09 20:04:57 +00:00
Adrian Wälchli 95374440ce
Move device parser tests inside Lite (#14586) 2022-09-07 21:22:46 +00:00
Adrian Wälchli d2459df2ff
Standalone Lite: Remaining Utilities (#14492)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Felonious-Spellfire <felonious.spellfire@gmail.com>
2022-09-07 15:25:23 +00:00
Carlos Mocholí bcad90141a
Remove old test artifacts (#14574) 2022-09-07 10:09:59 -04:00
Carlos Mocholí 8c4184c105
Integrate with `lightning_utilities.core.enums` (#14558) 2022-09-07 15:14:14 +02:00
Carlos Mocholí 5216c51096
Integrate `lightning_utilities.core.rank_zero` (#14556) 2022-09-07 09:21:48 +00:00
Carlos Mocholí 273a9ed8c1
Integrate `lightning_utilities.core.apply_func` (#14537) 2022-09-06 13:52:54 +00:00
Carlos Mocholí 44216fdd69
Integrate `lightning_utilities.core.imports` (#14475)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-06 12:56:20 +00:00
Carlos Mocholí 8a4a3b6766
Mark the lite `DeviceDtypeModuleMixin` as protected (#14548) 2022-09-06 14:17:15 +02:00
Rohit Gupta 8c6119fbce
Add auto wrapping support for `DDPFullyShardedStrategy` (#14383) 2022-09-05 19:07:26 +00:00
awaelchli 7f148b2c47 Deprecate pl/utilities/apply_func (#14516) 2022-09-05 20:30:42 +02:00
awaelchli 9fea2ed9d5 move pl/utilities/apply_func.py to pl/utilities/apply_func.py (#14516) 2022-09-05 20:30:42 +02:00
awaelchli cfea2be137 Deprecate pl/utilities/cloud_io.py (#14515) 2022-09-05 18:30:31 +02:00
awaelchli def6548596 move pl/utilities/cloud_io.py to lite/utilities/cloud_io.py (#14515) 2022-09-05 18:30:31 +02:00
awaelchli 165427a506 Deprecate pl/utilities/xla_device (#14514) 2022-09-05 17:36:02 +02:00
awaelchli 75d5a2d046 move pl/utilities/xla_device.py to lite/utilities/xla_device.py (#14514) 2022-09-05 17:36:02 +02:00
awaelchli c2879c20da Deprecate pl/core/mixins/device_dtype_mixin and update imports (#14511) 2022-09-05 16:31:00 +02:00
awaelchli cefe2fa123 Move test_dtype_device_mixin to lite (#14511) 2022-09-05 16:31:00 +02:00
Rohit Gupta ce702fd40e
Squeeze tensor while logging (#14489)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-05 14:01:51 +00:00
Tianshu Wang 23f0e20209
Fixed `WandbLogger` `save_dir` is not set after creation (#12748) (#14326)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-05 10:12:43 +00:00
Roberto de Moura Estevão Filho ed0164a3d2
Estimate stepping batches with max_steps if max_epochs is not set (#14317)
Co-authored-by: Roberto Estevão <robertode@microsoft.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-05 09:05:21 +00:00
Carlos Mocholí 4235eff712
Use a standalone test symlink for Lite (#14502) 2022-09-04 20:57:28 +02:00
Adrian Wälchli 291dc1b615
Standalone Lite CI setup (#14451)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-01 22:13:12 +00:00
Carlos Mocholí e0c2c3e677
Clean up fairscale imports (#14476) 2022-09-01 18:08:40 +02:00
Adrian Wälchli 28e18881a9
Mark stage argument in hooks as required (#14064)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-09-01 15:47:40 +02:00
Rohit Gupta e90ac769d6
Reset dataloaders on failure in tuner (#14372) 2022-08-31 21:00:18 +00:00
Carlos Mocholí 2e3d85af84
Remove deprecated rank zero utilities (#14471)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-31 18:29:11 +00:00
Anner 626827c872
update rng state save/load test to also run on cuda gpu (#14396)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-31 16:36:35 +00:00
Carlos Mocholí a1dd718781
Remove deprecated support for passing the warning category positionally (#14470) 2022-08-31 17:34:56 +02:00
Carlos Mocholí 291267c3bf
Unify rank zero messaging utilities (#14116) 2022-08-30 09:51:30 +00:00
ananthsub d0d1818d50
Update `has_len_all_ranks` to use `Strategy.root_device` (#12144)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-29 20:23:34 +00:00
Carlos Mocholí f202e84f4b
Remove the legacy `get_deprecated_arg_names` (#14415) 2022-08-29 14:53:57 +02:00
Krishna Kalyan 1a3fe39571
Removed deprecated `Trainer.num_processes` property in favour of `Trainer.num_devices` (#14423)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-28 23:59:24 +02:00
Krishna Kalyan 5cbe1f48d2
Removed the deprecated `Trainer.data_parallel_device_ids` function in favour of `Trainer.device_ids` (#14422)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-28 18:07:00 +00:00
Krishna Kalyan cea9a72d9d
Removed the deprecated the `trainer.lr_schedulers` (#14408)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-28 18:06:09 +00:00
otaj 1e04951206
Remove deprecated `TrainerCallbackHookMixin` (#14401)
* remove deprecated callback hook

* changelog
2022-08-28 10:56:37 +00:00
Rohit Gupta f3574176e2
Change `trainer.should_stop` to not stop in between an epoch and run until `min_steps/min_epochs` only (#13890) 2022-08-27 12:12:24 +00:00
Adrian Wälchli 250c06e406
Remove deprecated HPC model hooks (#14315)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 20:59:32 +00:00
Carlos Mocholí 3ba0f56b18
Remove support for the deprecated torchtext legacy (#14375) 2022-08-26 20:01:51 +00:00
Tianshu Wang 8950613552
save checkpoints and profiler output to the first logger (#14325)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 17:23:54 +00:00
Carlos Mocholí d4bcafad7a
Remove the deprecated loop output format (#14373) 2022-08-26 16:56:56 +00:00
Justin Goheen ed84d04bcf
Fix mypy errors attributed to `pytorch_lightning.core.datamodule` (#13693)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: otaj <ota@lightning.ai>
2022-08-26 16:26:26 +00:00
Adrian Wälchli fafd254678
Fix device parser logic to avoid creating CUDA context (#14319)
* let environment disable forking

* add helper function and error messages

* tests

* changelog

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 15:41:38 +00:00
Björn Barz 0102d0d4d4
Fix restoring trainer after `lr_find()` (#14113)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-26 15:19:08 +00:00
Justin Goheen 94e567e6f0
Fix mypy errors attributed to `pytorch_lightning.trainer.connectors.data_connector.py` (#13806)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-08-26 13:28:27 +00:00
Adrian Wälchli e2221a0b3e
Raise an error when resuming training with Apex (#14341) 2022-08-26 13:11:24 +00:00
Rohit Gupta 6d00f31f0c
Add auto wrapping for `DDPFullyShardedNativeStrategy` (#14252) 2022-08-26 09:01:48 +00:00
Christian Schell 70deac2cd4
Reset epoch progress with batch size scaler (#13846)
Co-authored-by: Christian Schell <christian.schell@uni-wuerzburg.de>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-26 14:12:00 +05:30
Adrian Wälchli e67842dcba
Support sharded optimizer state dumping outside of sharded strategies (#14208) 2022-08-26 07:58:21 +00:00
Justus Schock a01e016fff
Remove mps config for test (#14379)
* Remove mps config for test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-08-26 02:47:37 -04:00
Anner 33a5ed9879
Add torch.cuda rng state to seed save/load (#14384)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-26 05:26:00 +00:00
Tanmoy 807435885e
Fix `LightningDataModule` hparams parsing (#12806)
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-25 18:57:48 +00:00
Jirka Borovec 99ba95a38e
fix imports of collections.abc for py3.10 (#14345)
fix collections.abc for py3.10

Co-authored-by: Sherin Thomas <sherin@grid.ai>
2022-08-23 11:52:58 -04:00
Carlos Mocholí 7a617ec90e
Add back support for logging in the gradient clipping hooks (#14298)
* Add back support for logging in the gradient clipping hooks

* Docs and CHANGELOG

* Fix tests
2022-08-22 09:19:53 -04:00
Rohit Gupta db1835a82c
Fix an issue to avoid the impact of sanity check on `reload_dataloaders_every_n_epochs` for validation (#13964) 2022-08-21 23:55:03 +05:30
Kaushik B a8c6e69b43
Fix wrong num padding for RichProgressBar (#14296) 2022-08-19 09:40:44 +05:30
Rohit Gupta d9c6090170
Deprecate `on_colab_kaggle` func (#14247)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-18 18:34:21 +00:00
Adrian Wälchli 326f7565b0
Forward extra keyword arguments in `LightningDataModule.from_datasets` (#14185)
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-18 14:06:39 +00:00
Adrian Wälchli 7879628a3a
Fix access to logger attribute when multiple loggers are used (#14234)
* Fix access to logger attribute when multiple loggers are used

* add changelog
2022-08-18 08:55:08 -04:00
Rohit Gupta e949362a6b
Enable `on_before_batch_transfer` for `DPStrategy` and `IPUAccelerator` (#14023)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-18 12:12:29 +00:00
Adrian Wälchli 2e59c49592
Update defaults for WandbLogger's run name and project name (#14145) 2022-08-17 16:31:20 +00:00
otaj 44cdbcab04
Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks (#14212) 2022-08-17 11:42:54 -04:00
Rohit Gupta 48c23e5716
Use fsdp module to initialize precision scalar for fsdp native (#14092)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
2022-08-13 07:52:06 +00:00
Rohit Gupta c8e22b4572
Avoid raising the sampler warning if num_replicas=1 (#14097)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-08-12 08:44:21 +00:00
Adrian Wälchli 807f9d8c96
Replace unwrapping logic in strategies (#13738)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-12 08:24:04 +00:00
Rohit Gupta 6789a066b5
Avoid false positive warning about using `sync_dist` when using torchmetrics (#14143)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-12 12:52:24 +05:30
Rohit Gupta 2d9e00fab6
Profile batch transfer and gradient clipping hooks (#14069) 2022-08-11 23:21:53 +00:00
Adrian Wälchli 56533368af
Remove DeepSpeed version restriction from Lite (#13967) 2022-08-11 16:17:56 +00:00
Adrian Wälchli 3b18da3eaf
Fix saving hyperparameters in a composition where parent is not a LM or LDM (#14151)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-11 15:49:46 +00:00
Carlos Mocholí 3dc08b1ef5
Fix flaky test caused by weak reference (#14157) 2022-08-11 09:33:19 +02:00
Adrian Wälchli a7cebf2416
Fix entry point test for Python 3.10 (#14154) 2022-08-11 01:32:32 +02:00
Adrian Wälchli 4008f9cd41
Convert subprocess test to standalone test (#14101) 2022-08-10 17:15:12 -04:00
otaj f132d44821
Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported (#14117) 2022-08-10 16:09:50 +00:00
Carlos Mocholí 9b61b1c482
Remove duplicated test classes (#14122)
Remove duplicated classes
2022-08-10 17:21:05 +02:00
Adrian Wälchli dc8ff5ed26
Fix device placement when `.cuda()` called without specifying index (#14128) 2022-08-10 05:23:20 -04:00
Adam Reeve 975a4fc2f1
Support checkpoint save and load with Stochastic Weight Averaging (#9938)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-09 23:18:21 +00:00
Adrian Wälchli 06c255c5c1
Skip ddp fork tests on windows (#14121) 2022-08-09 22:54:10 +00:00
Carlos Mocholí d85085479d
Reset all results on epoch end (#14061) 2022-08-09 23:01:11 +05:30
Rohit Gupta ac369f5570
Fix incorrect `precision="mixed"` being used with `DeepSpeedStrategy` and `IPUStrategy` (#14041)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 21:25:23 +05:30
Anton Shevtsov c55fe7105b
Prefix seed_everything log messages with rank info (#14031)
Co-authored-by: Anton Shevtsov <aeshevtsov@avito.ru>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 15:40:30 +02:00
Adrian Wälchli 0cfc53d6b4
Fix regression on default value for `find_unused_parameters` (#14095) 2022-08-09 13:56:02 +05:30
Carlos Mocholí d072e4451a
Fix dtype inference during gradient norm computation (#14051) 2022-08-08 11:35:06 +00:00
Carlos Mocholí aaeff90254
Remove deprecated `DistributedType` and `DeviceType` enum classes (#14045) 2022-08-08 10:07:54 +02:00
Rohit Gupta b25275ccc2
Cast to fp16 before moving to device with deepspeed (#14000) 2022-08-05 22:15:15 +00:00
Carlos Mocholí 91dd6a68fb
Remove meta device utilities in favor of torchdistx (#13868) 2022-08-05 12:20:27 +00:00
Adrian Wälchli 3d5c3d24f9
Remove unused auto_collect_arguments class method (#14015) 2022-08-05 08:49:00 +00:00
Rohit Gupta a4e4cab7a6
Deprecate `amp_level` from `Trainer` (#13898)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-05 08:31:19 +00:00
Carlos Mocholí b88b700745
Remove the deprecated DDP2 strategy (#14026) 2022-08-04 20:27:35 +00:00
Rohit Gupta f5bd6e6f5f
Cast only floating types with IPUs (#13983) 2022-08-04 19:46:07 +00:00
Adrian Wälchli ef0623ec64
Remove deprecated training type plugins (#14011)
* Remove deprecated training type plugins

* update changelog

* DDP2Plugin

* Update src/pytorch_lightning/CHANGELOG.md
2022-08-04 18:00:00 +02:00
Rohit Gupta e78bf2044b
Raise an error if batch transfer hooks are overridden with IPUAccelerator (#13961)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-04 12:04:42 +00:00
Adam J. Stewart d748dae548
Fix erroneous warning for unset `max_epochs` (#13262)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-03 19:17:21 +00:00
Adrian Wälchli e6a8283e9c
Organize accelerator tests (#13986) 2022-08-03 13:49:55 +00:00
Adrian Wälchli 4ce97f37a2
Validate the model input of trainer methods (#13892)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-03 13:38:42 +00:00
Adrian Wälchli ce025bf954
Lazy import check for hydra dependency (#13812) 2022-08-03 04:27:16 -04:00
Jerome Anand b3203d93d0
Added support for HPU device stats monitor (#13819)
* Added support for HPU device stats monitor

Signed-off-by: Jerome <janand@habana.ai>

* Update changelog

Signed-off-by: Jerome <janand@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update reference

Signed-off-by: Jerome <janand@habana.ai>

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix alignment

* add descriptions

* Update hpu_intermediate.rst

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-02 13:31:31 +05:30
Adrian Wälchli eb233ea12d
Snapshot selected globals and restore them in spawned process (#13921)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-01 22:21:46 +00:00
Rohit Gupta 0f6caffa57
Fix deepspeed default precision plugin `amp_level` to O2 (#13897)
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-29 20:36:51 +00:00
Adrian Wälchli caaf35689c
Improvements to standalone scripts (#13840)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-28 23:33:22 +00:00
HMellor 07b39c257b
Cast on host instead of IPU when using `precision=16` (#13880)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-28 19:26:41 +00:00
Adrian Wälchli 25203d4c81
Organize model summary utilities (#13893) 2022-07-28 19:23:29 +02:00
Carlos Mocholí 406cea7146
Support DeepSpeed <0.7.0 (#13859)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-07-28 14:38:51 +00:00
Carlos Mocholí 1299e4f984
Run GPU tests with PyTorch 1.12 (#13716)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-28 19:37:57 +05:30
Carlos Mocholí 511875e567
Support DeepSpeed >=0.6.0, <0.6.5 (#13863)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 18:57:52 +02:00
Adrian Wälchli fff62f0ae5
Fix TPU testing and collect all tests (#11098)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-27 15:40:40 +00:00
otaj 95f5f170f5
Allowed custom `BatchSampler`s when instantiated in `*_dataloader` hook (#13640)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 15:32:50 +00:00
Adrian Wälchli 2a24b906ac
Add batch size script argument for standalone tests (#13841)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-07-27 12:36:22 +00:00
otaj 4c7b9f0b11
Disallow batch sampler with multiple IPU devices (#13854)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-27 15:20:43 +05:30
Anton Shevtsov 41f45b475e
Check if the scheduler already has `reduce_on_plateau` (#13838)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-27 09:10:57 +00:00
Adrian Wälchli c3911700d1
Fix error handling in learning rate finder (#13845)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-27 04:32:39 -04:00
Rohit Gupta faf7ff57c0
Add support for async checkpointing (#13658) 2022-07-26 21:13:19 +05:30
Adrian Wälchli a8d7b4476c
Fix PyTorch spelling errors (#13774)
* Fix PyTorch spelling errors

* more
2022-07-25 12:51:16 -04:00
Justus Schock 227871982d
Merge different gpu backends with accelerator='gpu' (#13642)
* Rename GPUAccelerator to CUDAAccelerator

* Add back GPUAccelerator and deprecate it

* Remove temporary registration

* accelerator connector reroute

* accelerator_connector tests

* update enums

* lite support + tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move "gpu" support up before actual accelerator flag checks

* Stupid arguments

* fix tests

* change exception type

* fix registry test

* pre-commit

* CI: debug HPU flow (#13419)

* Update the hpu-tests.yml to pull docker from vault
* fire & sudo
* habana-gaudi-hpus
* Check the driver status on gaudi server (#13718)

Co-authored-by: arao <arao@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>

* Update typing-extensions requirement from <4.2.1,>=4.0.0 to >=4.0.0,<4.3.1 in /requirements (#13529)

Update typing-extensions requirement in /requirements

Updates the requirements on [typing-extensions](https://github.com/python/typing_extensions) to permit the latest version.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.0.0...4.3.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit suggestions (#13540)

updates:
- [github.com/psf/black: 22.3.0 → 22.6.0](https://github.com/psf/black/compare/22.3.0...22.6.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [FIX] Native FSDP precision + tests (#12985)

* Simplify fetching's loader types (#13111)

* Include app templates to the lightning and app packages (#13731)

* Include app templates to the package

Co-authored-by: mansy <mansy@lightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix mypy typing errors in pytorch_lightning/callbacks/model_checkpoint.py (#13617)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fix typos initialize in docs (#13557)


Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix main progress bar counter when `val_check_interval=int` and `check_val_every_n_epoch=None` (#12832)

* Fix mypy errors attributed to `pytorch_lightning.loggers.tensorboard.py` (#13688)

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fix mypy errors attributed to `pytorch_lightning.loggers.mlflow` (#13691)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>

* fix mypy errors for loggers/wandb.py (#13483)


Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Fix gatekeeper minimum check (#13769)

* changelog

* changelog

* fix order

* move up again

* add missing test

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: arao <arao@habana.ai>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Co-authored-by: mansy <mansy@lightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Lee Jungwon <33821003+BongYang@users.noreply.github.com>
Co-authored-by: Nathaniel D'Amours <88633026+NathanielDamours@users.noreply.github.com>
Co-authored-by: Justin Goheen <26209687+JustinGoheen@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Gautier Dagan <s2234411@ed.ac.uk>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-25 14:46:45 +00:00
Mauricio Villegas 1b31039c58
Update LightningCLI test for new support in latest release of jsonargparse (#13805) 2022-07-25 09:25:42 +00:00
Adrian Wälchli 81f149e9d4
Rename spawn-based launchers (#13743) 2022-07-23 11:48:15 -04:00
Adrian Wälchli fa886f2a58
Lazy import check for neptune dependency (#13477)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-23 14:06:26 +00:00
Adrian Wälchli d24978baa3
Add ddp_notebook alias for ddp_fork (#13744)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-23 09:06:35 -04:00
Jinyoung Lim ae9803137a
Add logging messages to notify when `FitLoop` stopping conditions are met (#9749)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-23 12:07:47 +00:00
Carlos Mocholí 4f53e7132f
Promote the CLI out of utilities (#13767) 2022-07-23 12:07:29 +00:00
Adrian Wälchli f6f06d4e42
Set default strategy to ddp_fork in interactive environments (#13746) 2022-07-22 19:34:30 +00:00
Carlos Mocholí 9f51c07604
Support setting the trainer reference recursively for ensembles (#13638)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-07-22 19:58:46 +02:00