ritsuki1227
6855f653bb
Set `MLFlowLogger` status to FAILED when training raises an error ( #12292 )
...
Co-authored-by: Ritsuki Yamada <ritsuki.yamada@uzabase.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-20 07:43:32 -04:00
awaelchli
c0ff7a1b77
Add backward-compatibility for LightningLite in PL ( #14735 )
2022-09-20 13:31:56 +02:00
awaelchli
e3e71670e6
Move src/pytorch_lightning/lite to src/lightning_lite ( #14735 )
2022-09-20 13:31:56 +02:00
Carlos Mocholí
810643bca2
Surface Neptune installation problems to the user ( #14715 )
2022-09-20 10:19:51 +00:00
Mauricio Villegas
3064c28ce1
Added args parameter to LightningCLI to ease running from within Python ( #14596 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-19 17:38:30 +00:00
Carlos Mocholí
e9c571d39f
Move accelerator-specific parsing functions with their accelerators ( #14753 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-18 22:48:45 +00:00
Adrian Wälchli
4f9c7793e7
Fix TensorBoardLogger creating redundant experiment when finalizing ( #14762 )
...
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
2022-09-18 16:27:15 -04:00
Adrian Wälchli
35c65b0287
Fix test suite when running on MPS-enabled hardware ( #14708 )
2022-09-16 19:21:36 +00:00
Adrian Wälchli
47f0d336f1
Standalone Lite: Update LightningLite ( #14726 )
2022-09-16 17:25:27 +00:00
Carlos Mocholí
8c01c89d74
Remove deprecated `NeptuneLogger` code ( #14727 )
2022-09-16 16:26:15 +00:00
Adrian Wälchli
5bef75648e
Remove deprecated `torch_distributed_backend` logic ( #14693 )
...
* Remove deprecated torch_distributed_backend logic
* changelog
* mention deprecated
* imports
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-16 17:27:36 +02:00
Adrian Wälchli
619e76f22d
Remove silent behavior when `num_slurm_tasks` does not correspond to number of processes in Trainer ( #14300 )
...
* simplify logic
* remove hpc
* update
* add changelog
* more tests
* update test
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-16 11:00:09 +00:00
Carlos Mocholí
5ff78f0753
Use the setter in the children recursively ( #14724 )
2022-09-15 13:58:12 +00:00
Adrian Wälchli
8b3d6d8feb
Add easy access to `state_dict` in Lite module wrapper ( #14629 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-14 19:29:23 -04:00
Manan Goel
48e783dd0d
Added support for downloading wandb artifacts in the WandbLogger ( #14551 )
...
* Added functions to the WandbLogger to download and use artifacts without having to access the experiment object
* Updated CHANGLELOG.md
* Added suggested changes
* Delete test_script
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-14 14:11:52 +00:00
Adrian Wälchli
6333caabb0
Standalone Lite: Strategy base classes and registry ( #14662 )
...
* add accelerator implementations to lite
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix imports
* rename registry argument
* fix test
* fix tests
* remove duplicated test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix tests
* deprecation
* deprecations
* flake8
* fixes
* add mps to runif
* fix tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove more
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* local import
* undo device stats :(
* fix import
* stupid typehints
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* more refactors :(
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* rename init_device to setup_device
* remove unused import
* make uppercase to differentiate from class
* trick test after moving import locally
* add base classes and registry
* reg
* registry
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tests
* update to other branches
* resolve todo(lite)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add very basic unit tests
* fix name assignment
* Update src/lightning_lite/strategies/parallel.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* remove deprecated property
* remove pre- and post backward for now
* protecting the registry utility function
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove unused import
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-14 09:15:21 -04:00
otaj
616304831a
Remove deprecated `BaseProfiler` and `AbstractProfiler` ( #14404 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-09-13 14:52:09 +00:00
Adrian Wälchli
19a1274093
Better error message when dataloader and datamodule is None (V2) ( #14637 )
2022-09-13 12:26:03 +00:00
Adrian Wälchli
1ee3d1eb72
Avoid warning when cloning tensor in self.log ( #14599 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-13 16:23:46 +05:30
Adrian Wälchli
4bd135a6f6
Remove deprecated `LoggerCollection` ( #14283 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 21:46:46 +00:00
Max Ehrlich
e5998e6bf2
Make the SLURM Preemption/Timeout Signal Configurable ( #14626 )
...
* Add parameter to change the preemption signal
* Make the signal connector use the custom signal from SLURMEnvironment
Signed-off-by: Max Ehrlich <max.ehr@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-12 19:24:35 +00:00
Adrian Wälchli
925edbca07
Remove the deprecated `weights_save_path` Trainer argument ( #14424 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-12 19:02:38 +00:00
Mauricio Villegas
1680a76819
Removed from_argparse_args tests in test_cli.py ( #14597 )
2022-09-12 18:25:29 +00:00
Adrian Wälchli
d013bcc5bf
Standalone Lite: Accelerators ( #14578 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 16:00:14 +00:00
Carlos Mocholí
cf3428784f
Set `running_torchscript` recursively ( #14657 )
...
* Set `running_torchscript` recursively
* CHANGELOG
2022-09-12 14:39:40 +00:00
Carlos Mocholí
e859546b96
Integrate lightning_utilities `is_overridden` ( #14620 )
2022-09-12 15:16:57 +02:00
awaelchli
cbbd148089
Add back-compatibility for checkpoint io plugins in pl/plugins/io ( #14519 )
2022-09-12 08:28:46 -04:00
awaelchli
463439e624
Move checkpoint io plugins from pl/plugins/io to lite/plugins/io ( #14519 )
2022-09-12 08:28:46 -04:00
Adrian Wälchli
024e7b8204
Standalone Lite: Cluster Environments ( #14509 )
2022-09-12 12:20:08 +02:00
Vasilis Vryniotis
7e9e441843
Use TorchVision's Multi-weight Support and Model Registration API on Lightning ( #14567 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-09 20:04:57 +00:00
Adrian Wälchli
95374440ce
Move device parser tests inside Lite ( #14586 )
2022-09-07 21:22:46 +00:00
Adrian Wälchli
d2459df2ff
Standalone Lite: Remaining Utilities ( #14492 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Felonious-Spellfire <felonious.spellfire@gmail.com>
2022-09-07 15:25:23 +00:00
Carlos Mocholí
bcad90141a
Remove old test artifacts ( #14574 )
2022-09-07 10:09:59 -04:00
Carlos Mocholí
8c4184c105
Integrate with `lightning_utilities.core.enums` ( #14558 )
2022-09-07 15:14:14 +02:00
Carlos Mocholí
5216c51096
Integrate `lightning_utilities.core.rank_zero` ( #14556 )
2022-09-07 09:21:48 +00:00
Carlos Mocholí
273a9ed8c1
Integrate `lightning_utilities.core.apply_func` ( #14537 )
2022-09-06 13:52:54 +00:00
Carlos Mocholí
44216fdd69
Integrate `lightning_utilities.core.imports` ( #14475 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-06 12:56:20 +00:00
Carlos Mocholí
8a4a3b6766
Mark the lite `DeviceDtypeModuleMixin` as protected ( #14548 )
2022-09-06 14:17:15 +02:00
Rohit Gupta
8c6119fbce
Add auto wrapping support for `DDPFullyShardedStrategy` ( #14383 )
2022-09-05 19:07:26 +00:00
awaelchli
7f148b2c47
Deprecate pl/utilities/apply_func ( #14516 )
2022-09-05 20:30:42 +02:00
awaelchli
9fea2ed9d5
move pl/utilities/apply_func.py to pl/utilities/apply_func.py ( #14516 )
2022-09-05 20:30:42 +02:00
awaelchli
cfea2be137
Deprecate pl/utilities/cloud_io.py ( #14515 )
2022-09-05 18:30:31 +02:00
awaelchli
def6548596
move pl/utilities/cloud_io.py to lite/utilities/cloud_io.py ( #14515 )
2022-09-05 18:30:31 +02:00
awaelchli
165427a506
Deprecate pl/utilities/xla_device ( #14514 )
2022-09-05 17:36:02 +02:00
awaelchli
75d5a2d046
move pl/utilities/xla_device.py to lite/utilities/xla_device.py ( #14514 )
2022-09-05 17:36:02 +02:00
awaelchli
c2879c20da
Deprecate pl/core/mixins/device_dtype_mixin and update imports ( #14511 )
2022-09-05 16:31:00 +02:00
awaelchli
cefe2fa123
Move test_dtype_device_mixin to lite ( #14511 )
2022-09-05 16:31:00 +02:00
Rohit Gupta
ce702fd40e
Squeeze tensor while logging ( #14489 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-05 14:01:51 +00:00
Tianshu Wang
23f0e20209
Fixed `WandbLogger` `save_dir` is not set after creation ( #12748 ) ( #14326 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-05 10:12:43 +00:00
Roberto de Moura Estevão Filho
ed0164a3d2
Estimate stepping batches with max_steps if max_epochs is not set ( #14317 )
...
Co-authored-by: Roberto Estevão <robertode@microsoft.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-05 09:05:21 +00:00
Carlos Mocholí
4235eff712
Use a standalone test symlink for Lite ( #14502 )
2022-09-04 20:57:28 +02:00
Adrian Wälchli
291dc1b615
Standalone Lite CI setup ( #14451 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-01 22:13:12 +00:00
Carlos Mocholí
e0c2c3e677
Clean up fairscale imports ( #14476 )
2022-09-01 18:08:40 +02:00
Adrian Wälchli
28e18881a9
Mark stage argument in hooks as required ( #14064 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-09-01 15:47:40 +02:00
Rohit Gupta
e90ac769d6
Reset dataloaders on failure in tuner ( #14372 )
2022-08-31 21:00:18 +00:00
Carlos Mocholí
2e3d85af84
Remove deprecated rank zero utilities ( #14471 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-31 18:29:11 +00:00
Anner
626827c872
update rng state save/load test to also run on cuda gpu ( #14396 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-31 16:36:35 +00:00
Carlos Mocholí
a1dd718781
Remove deprecated support for passing the warning category positionally ( #14470 )
2022-08-31 17:34:56 +02:00
Carlos Mocholí
291267c3bf
Unify rank zero messaging utilities ( #14116 )
2022-08-30 09:51:30 +00:00
ananthsub
d0d1818d50
Update `has_len_all_ranks` to use `Strategy.root_device` ( #12144 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-29 20:23:34 +00:00
Carlos Mocholí
f202e84f4b
Remove the legacy `get_deprecated_arg_names` ( #14415 )
2022-08-29 14:53:57 +02:00
Krishna Kalyan
1a3fe39571
Removed deprecated `Trainer.num_processes` property in favour of `Trainer.num_devices` ( #14423 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-28 23:59:24 +02:00
Krishna Kalyan
5cbe1f48d2
Removed the deprecated `Trainer.data_parallel_device_ids` function in favour of `Trainer.device_ids` ( #14422 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-28 18:07:00 +00:00
Krishna Kalyan
cea9a72d9d
Removed the deprecated the `trainer.lr_schedulers` ( #14408 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-28 18:06:09 +00:00
otaj
1e04951206
Remove deprecated `TrainerCallbackHookMixin` ( #14401 )
...
* remove deprecated callback hook
* changelog
2022-08-28 10:56:37 +00:00
Rohit Gupta
f3574176e2
Change `trainer.should_stop` to not stop in between an epoch and run until `min_steps/min_epochs` only ( #13890 )
2022-08-27 12:12:24 +00:00
Adrian Wälchli
250c06e406
Remove deprecated HPC model hooks ( #14315 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 20:59:32 +00:00
Carlos Mocholí
3ba0f56b18
Remove support for the deprecated torchtext legacy ( #14375 )
2022-08-26 20:01:51 +00:00
Tianshu Wang
8950613552
save checkpoints and profiler output to the first logger ( #14325 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 17:23:54 +00:00
Carlos Mocholí
d4bcafad7a
Remove the deprecated loop output format ( #14373 )
2022-08-26 16:56:56 +00:00
Justin Goheen
ed84d04bcf
Fix mypy errors attributed to `pytorch_lightning.core.datamodule` ( #13693 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: otaj <ota@lightning.ai>
2022-08-26 16:26:26 +00:00
Adrian Wälchli
fafd254678
Fix device parser logic to avoid creating CUDA context ( #14319 )
...
* let environment disable forking
* add helper function and error messages
* tests
* changelog
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 15:41:38 +00:00
Björn Barz
0102d0d4d4
Fix restoring trainer after `lr_find()` ( #14113 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-26 15:19:08 +00:00
Justin Goheen
94e567e6f0
Fix mypy errors attributed to `pytorch_lightning.trainer.connectors.data_connector.py` ( #13806 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-08-26 13:28:27 +00:00
Adrian Wälchli
e2221a0b3e
Raise an error when resuming training with Apex ( #14341 )
2022-08-26 13:11:24 +00:00
Rohit Gupta
6d00f31f0c
Add auto wrapping for `DDPFullyShardedNativeStrategy` ( #14252 )
2022-08-26 09:01:48 +00:00
Christian Schell
70deac2cd4
Reset epoch progress with batch size scaler ( #13846 )
...
Co-authored-by: Christian Schell <christian.schell@uni-wuerzburg.de>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-26 14:12:00 +05:30
Adrian Wälchli
e67842dcba
Support sharded optimizer state dumping outside of sharded strategies ( #14208 )
2022-08-26 07:58:21 +00:00
Justus Schock
a01e016fff
Remove mps config for test ( #14379 )
...
* Remove mps config for test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-08-26 02:47:37 -04:00
Anner
33a5ed9879
Add torch.cuda rng state to seed save/load ( #14384 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-26 05:26:00 +00:00
Tanmoy
807435885e
Fix `LightningDataModule` hparams parsing ( #12806 )
...
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-25 18:57:48 +00:00
Jirka Borovec
99ba95a38e
fix imports of collections.abc for py3.10 ( #14345 )
...
fix collections.abc for py3.10
Co-authored-by: Sherin Thomas <sherin@grid.ai>
2022-08-23 11:52:58 -04:00
Carlos Mocholí
7a617ec90e
Add back support for logging in the gradient clipping hooks ( #14298 )
...
* Add back support for logging in the gradient clipping hooks
* Docs and CHANGELOG
* Fix tests
2022-08-22 09:19:53 -04:00
Rohit Gupta
db1835a82c
Fix an issue to avoid the impact of sanity check on `reload_dataloaders_every_n_epochs` for validation ( #13964 )
2022-08-21 23:55:03 +05:30
Kaushik B
a8c6e69b43
Fix wrong num padding for RichProgressBar ( #14296 )
2022-08-19 09:40:44 +05:30
Rohit Gupta
d9c6090170
Deprecate `on_colab_kaggle` func ( #14247 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-18 18:34:21 +00:00
Adrian Wälchli
326f7565b0
Forward extra keyword arguments in `LightningDataModule.from_datasets` ( #14185 )
...
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-18 14:06:39 +00:00
Adrian Wälchli
7879628a3a
Fix access to logger attribute when multiple loggers are used ( #14234 )
...
* Fix access to logger attribute when multiple loggers are used
* add changelog
2022-08-18 08:55:08 -04:00
Rohit Gupta
e949362a6b
Enable `on_before_batch_transfer` for `DPStrategy` and `IPUAccelerator` ( #14023 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-18 12:12:29 +00:00
Adrian Wälchli
2e59c49592
Update defaults for WandbLogger's run name and project name ( #14145 )
2022-08-17 16:31:20 +00:00
otaj
44cdbcab04
Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks ( #14212 )
2022-08-17 11:42:54 -04:00
Rohit Gupta
48c23e5716
Use fsdp module to initialize precision scalar for fsdp native ( #14092 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
2022-08-13 07:52:06 +00:00
Rohit Gupta
c8e22b4572
Avoid raising the sampler warning if num_replicas=1 ( #14097 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-08-12 08:44:21 +00:00
Adrian Wälchli
807f9d8c96
Replace unwrapping logic in strategies ( #13738 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-12 08:24:04 +00:00
Rohit Gupta
6789a066b5
Avoid false positive warning about using `sync_dist` when using torchmetrics ( #14143 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-12 12:52:24 +05:30
Rohit Gupta
2d9e00fab6
Profile batch transfer and gradient clipping hooks ( #14069 )
2022-08-11 23:21:53 +00:00
Adrian Wälchli
56533368af
Remove DeepSpeed version restriction from Lite ( #13967 )
2022-08-11 16:17:56 +00:00
Adrian Wälchli
3b18da3eaf
Fix saving hyperparameters in a composition where parent is not a LM or LDM ( #14151 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-11 15:49:46 +00:00
Carlos Mocholí
3dc08b1ef5
Fix flaky test caused by weak reference ( #14157 )
2022-08-11 09:33:19 +02:00
Adrian Wälchli
a7cebf2416
Fix entry point test for Python 3.10 ( #14154 )
2022-08-11 01:32:32 +02:00
Adrian Wälchli
4008f9cd41
Convert subprocess test to standalone test ( #14101 )
2022-08-10 17:15:12 -04:00
otaj
f132d44821
Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported ( #14117 )
2022-08-10 16:09:50 +00:00
Carlos Mocholí
9b61b1c482
Remove duplicated test classes ( #14122 )
...
Remove duplicated classes
2022-08-10 17:21:05 +02:00
Adrian Wälchli
dc8ff5ed26
Fix device placement when `.cuda()` called without specifying index ( #14128 )
2022-08-10 05:23:20 -04:00
Adam Reeve
975a4fc2f1
Support checkpoint save and load with Stochastic Weight Averaging ( #9938 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-09 23:18:21 +00:00
Adrian Wälchli
06c255c5c1
Skip ddp fork tests on windows ( #14121 )
2022-08-09 22:54:10 +00:00
Carlos Mocholí
d85085479d
Reset all results on epoch end ( #14061 )
2022-08-09 23:01:11 +05:30
Rohit Gupta
ac369f5570
Fix incorrect `precision="mixed"` being used with `DeepSpeedStrategy` and `IPUStrategy` ( #14041 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 21:25:23 +05:30
Anton Shevtsov
c55fe7105b
Prefix seed_everything log messages with rank info ( #14031 )
...
Co-authored-by: Anton Shevtsov <aeshevtsov@avito.ru>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 15:40:30 +02:00
Adrian Wälchli
0cfc53d6b4
Fix regression on default value for `find_unused_parameters` ( #14095 )
2022-08-09 13:56:02 +05:30
Carlos Mocholí
d072e4451a
Fix dtype inference during gradient norm computation ( #14051 )
2022-08-08 11:35:06 +00:00
Carlos Mocholí
aaeff90254
Remove deprecated `DistributedType` and `DeviceType` enum classes ( #14045 )
2022-08-08 10:07:54 +02:00
Rohit Gupta
b25275ccc2
Cast to fp16 before moving to device with deepspeed ( #14000 )
2022-08-05 22:15:15 +00:00
Carlos Mocholí
91dd6a68fb
Remove meta device utilities in favor of torchdistx ( #13868 )
2022-08-05 12:20:27 +00:00
Adrian Wälchli
3d5c3d24f9
Remove unused auto_collect_arguments class method ( #14015 )
2022-08-05 08:49:00 +00:00
Rohit Gupta
a4e4cab7a6
Deprecate `amp_level` from `Trainer` ( #13898 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-05 08:31:19 +00:00
Carlos Mocholí
b88b700745
Remove the deprecated DDP2 strategy ( #14026 )
2022-08-04 20:27:35 +00:00
Rohit Gupta
f5bd6e6f5f
Cast only floating types with IPUs ( #13983 )
2022-08-04 19:46:07 +00:00
Adrian Wälchli
ef0623ec64
Remove deprecated training type plugins ( #14011 )
...
* Remove deprecated training type plugins
* update changelog
* DDP2Plugin
* Update src/pytorch_lightning/CHANGELOG.md
2022-08-04 18:00:00 +02:00
Rohit Gupta
e78bf2044b
Raise an error if batch transfer hooks are overridden with IPUAccelerator ( #13961 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-04 12:04:42 +00:00
Adam J. Stewart
d748dae548
Fix erroneous warning for unset `max_epochs` ( #13262 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-03 19:17:21 +00:00
Adrian Wälchli
e6a8283e9c
Organize accelerator tests ( #13986 )
2022-08-03 13:49:55 +00:00
Adrian Wälchli
4ce97f37a2
Validate the model input of trainer methods ( #13892 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-03 13:38:42 +00:00
Adrian Wälchli
ce025bf954
Lazy import check for hydra dependency ( #13812 )
2022-08-03 04:27:16 -04:00
Jerome Anand
b3203d93d0
Added support for HPU device stats monitor ( #13819 )
...
* Added support for HPU device stats monitor
Signed-off-by: Jerome <janand@habana.ai>
* Update changelog
Signed-off-by: Jerome <janand@habana.ai>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Update reference
Signed-off-by: Jerome <janand@habana.ai>
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* fix alignment
* add descriptions
* Update hpu_intermediate.rst
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-02 13:31:31 +05:30
Adrian Wälchli
eb233ea12d
Snapshot selected globals and restore them in spawned process ( #13921 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-01 22:21:46 +00:00
Rohit Gupta
0f6caffa57
Fix deepspeed default precision plugin `amp_level` to O2 ( #13897 )
...
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-29 20:36:51 +00:00
Adrian Wälchli
caaf35689c
Improvements to standalone scripts ( #13840 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-28 23:33:22 +00:00
HMellor
07b39c257b
Cast on host instead of IPU when using `precision=16` ( #13880 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-28 19:26:41 +00:00
Adrian Wälchli
25203d4c81
Organize model summary utilities ( #13893 )
2022-07-28 19:23:29 +02:00
Carlos Mocholí
406cea7146
Support DeepSpeed <0.7.0 ( #13859 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-07-28 14:38:51 +00:00
Carlos Mocholí
1299e4f984
Run GPU tests with PyTorch 1.12 ( #13716 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-28 19:37:57 +05:30
Carlos Mocholí
511875e567
Support DeepSpeed >=0.6.0, <0.6.5 ( #13863 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 18:57:52 +02:00
Adrian Wälchli
fff62f0ae5
Fix TPU testing and collect all tests ( #11098 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-27 15:40:40 +00:00
otaj
95f5f170f5
Allowed custom `BatchSampler`s when instantiated in `*_dataloader` hook ( #13640 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 15:32:50 +00:00
Adrian Wälchli
2a24b906ac
Add batch size script argument for standalone tests ( #13841 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-07-27 12:36:22 +00:00
otaj
4c7b9f0b11
Disallow batch sampler with multiple IPU devices ( #13854 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-27 15:20:43 +05:30
Anton Shevtsov
41f45b475e
Check if the scheduler already has `reduce_on_plateau` ( #13838 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-27 09:10:57 +00:00
Adrian Wälchli
c3911700d1
Fix error handling in learning rate finder ( #13845 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-27 04:32:39 -04:00
Rohit Gupta
faf7ff57c0
Add support for async checkpointing ( #13658 )
2022-07-26 21:13:19 +05:30
Adrian Wälchli
a8d7b4476c
Fix PyTorch spelling errors ( #13774 )
...
* Fix PyTorch spelling errors
* more
2022-07-25 12:51:16 -04:00
Justus Schock
227871982d
Merge different gpu backends with accelerator='gpu' ( #13642 )
...
* Rename GPUAccelerator to CUDAAccelerator
* Add back GPUAccelerator and deprecate it
* Remove temporary registration
* accelerator connector reroute
* accelerator_connector tests
* update enums
* lite support + tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* move "gpu" support up before actual accelerator flag checks
* Stupid arguments
* fix tests
* change exception type
* fix registry test
* pre-commit
* CI: debug HPU flow (#13419 )
* Update the hpu-tests.yml to pull docker from vault
* fire & sudo
* habana-gaudi-hpus
* Check the driver status on gaudi server (#13718 )
Co-authored-by: arao <arao@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
* Update typing-extensions requirement from <4.2.1,>=4.0.0 to >=4.0.0,<4.3.1 in /requirements (#13529 )
Update typing-extensions requirement in /requirements
Updates the requirements on [typing-extensions](https://github.com/python/typing_extensions ) to permit the latest version.
- [Release notes](https://github.com/python/typing_extensions/releases )
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md )
- [Commits](https://github.com/python/typing_extensions/compare/4.0.0...4.3.0 )
---
updated-dependencies:
- dependency-name: typing-extensions
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [pre-commit.ci] pre-commit suggestions (#13540 )
updates:
- [github.com/psf/black: 22.3.0 → 22.6.0](https://github.com/psf/black/compare/22.3.0...22.6.0 )
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [FIX] Native FSDP precision + tests (#12985 )
* Simplify fetching's loader types (#13111 )
* Include app templates to the lightning and app packages (#13731 )
* Include app templates to the package
Co-authored-by: mansy <mansy@lightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix mypy typing errors in pytorch_lightning/callbacks/model_checkpoint.py (#13617 )
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Fix typos initialize in docs (#13557 )
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix main progress bar counter when `val_check_interval=int` and `check_val_every_n_epoch=None` (#12832 )
* Fix mypy errors attributed to `pytorch_lightning.loggers.tensorboard.py` (#13688 )
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Fix mypy errors attributed to `pytorch_lightning.loggers.mlflow` (#13691 )
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
* fix mypy errors for loggers/wandb.py (#13483 )
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
* Fix gatekeeper minimum check (#13769 )
* changelog
* changelog
* fix order
* move up again
* add missing test
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: arao <arao@habana.ai>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Co-authored-by: mansy <mansy@lightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Lee Jungwon <33821003+BongYang@users.noreply.github.com>
Co-authored-by: Nathaniel D'Amours <88633026+NathanielDamours@users.noreply.github.com>
Co-authored-by: Justin Goheen <26209687+JustinGoheen@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Gautier Dagan <s2234411@ed.ac.uk>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-25 14:46:45 +00:00
Mauricio Villegas
1b31039c58
Update LightningCLI test for new support in latest release of jsonargparse ( #13805 )
2022-07-25 09:25:42 +00:00
Adrian Wälchli
81f149e9d4
Rename spawn-based launchers ( #13743 )
2022-07-23 11:48:15 -04:00
Adrian Wälchli
fa886f2a58
Lazy import check for neptune dependency ( #13477 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-23 14:06:26 +00:00
Adrian Wälchli
d24978baa3
Add ddp_notebook alias for ddp_fork ( #13744 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-23 09:06:35 -04:00
Jinyoung Lim
ae9803137a
Add logging messages to notify when `FitLoop` stopping conditions are met ( #9749 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-23 12:07:47 +00:00
Carlos Mocholí
4f53e7132f
Promote the CLI out of utilities ( #13767 )
2022-07-23 12:07:29 +00:00
Adrian Wälchli
f6f06d4e42
Set default strategy to ddp_fork in interactive environments ( #13746 )
2022-07-22 19:34:30 +00:00
Carlos Mocholí
9f51c07604
Support setting the trainer reference recursively for ensembles ( #13638 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-07-22 19:58:46 +02:00