Commit Graph

2433 Commits

Author SHA1 Message Date
Carlos Mocholí 5ad5ba54c0
Refactor fetching function (#11516) 2022-01-20 20:06:58 +01:00
Carlos Mocholí 075b8801c9
Fix checkpoint values when saving and resetting the tuner state (#11518) 2022-01-20 18:54:40 +00:00
Aki Nitta 3c3ba39a06
Tests: Fail on FutureWarning (#11541) 2022-01-20 12:52:34 +00:00
Carlos Mocholí 7295457a7b
[CLI] Save only the configuration used (#11532) 2022-01-20 12:35:43 +00:00
Rafał Jankowski e78d658c8d
Remove access to `_short_id` in NeptuneLogger (#11517) 2022-01-20 12:07:42 +00:00
Maaz Karim 16a04b29eb
Mark SignalConnector as protected (#11513)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-20 08:39:59 +01:00
ananthsub 1bd6fc979e
Remove `Strategy.on_tpu` property (#11536) 2022-01-20 08:25:26 +01:00
ananthsub f41d1e5e5e
Remove `Strategy.on_gpu` (#11537) 2022-01-19 21:27:12 +00:00
Rohit Gupta f7f835fa0e
improve simple profiler output (#11414) 2022-01-18 19:58:34 +00:00
Carlos Mocholí 62818dbace
Use a dataclass as the scheduler config (#11443) 2022-01-18 20:23:32 +01:00
Mauricio Villegas 6397ac4ffd
[CLI] Add unit test with a model that has a parameter with `lazy_instance` default. (#11509) 2022-01-18 16:36:54 +01:00
Jv Kyle Eclarin c85946531d
Update `tests/callbacks/*.py` to use `devices` instead of `gpus` or `ipus` (#11387)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-01-18 19:13:01 +05:30
Carlos Mocholí 344ab1e0a5
Move the `lightning_optimizers` ownership to the `Strategy` (#11444) 2022-01-18 12:58:56 +01:00
Rohit Gupta 033dba1494
Disable attaching samplers when using `IterableDataset` (#11507) 2022-01-17 23:33:57 +01:00
Carlos Mocholí 9cf9ded73b
Simplify data fetching (#11466) 2022-01-17 14:46:55 +00:00
Jv Kyle Eclarin 2012816645
Update `tests/core/*.py` to use `devices` instead of `gpus` or `ipus` (#11433)
* update tests for v2
* Update
* Pass devices to kwargs
* add accelerator to kwargs
* Fix testing with cpu on GPU env

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-01-17 18:15:02 +09:00
Jv Kyle Eclarin 3dee2759ee
update tests for v2 (#11487) 2022-01-16 21:43:37 +05:30
Jv Kyle Eclarin 5dc8002d46
update tests for v2 (#11486) 2022-01-16 21:43:07 +05:30
Jv Kyle Eclarin f97359a8c2
update tests for v2 (#11485) 2022-01-16 21:42:18 +05:30
Carlos Mocholí 18bbb39eef
Set `Loop.restarting` recursively (#11442)
* Set `Loop.restarting` recursively
* Docs
* CHANGELOG
* Update pytorch_lightning/loops/epoch/training_epoch_loop.py
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-01-14 19:25:23 +09:00
Jv Kyle Eclarin 36d42aaa88
Update `tests/utilities/*.py` to use `devices` instead of `gpus` or `ipus` (#11458) 2022-01-13 14:52:54 +01:00
Carlos Mocholí 5914fb748f
Add typing to accelerators/gpu.py (#11333) 2022-01-12 19:44:51 +00:00
Carlos Mocholí f5bbc2cf17
Avoid in-place ops during logging result updates (#11401)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-12 09:09:36 +01:00
Jv Kyle Eclarin d2d284fd6e
Update `tests/checkpointing/*.py` to use `devices` instead of `gpus` or `ipus` (#11408)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-12 05:47:01 +00:00
Aki Nitta 8dc36c3745
Fix inconsistent exceptions raised with no `rich` installed (#11360)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:55:51 +00:00
Rohit Gupta 82c8875f33
Add `LightningModule.lr_scheduler_step` (#10249)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:53:49 +00:00
Aki Nitta ba7193721a
Fix `torch.distributed._*` import statements in tests (#11416)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:53:14 +00:00
Carlos Mocholí 9771040621
Add typing to `TQDMProgressBar` (#11369) 2022-01-12 01:07:30 +00:00
edward-io 6107ce8e0d
Add DETAIL logs for batch use cases (#11008) 2022-01-12 01:22:48 +01:00
Jv Kyle Eclarin 3e0569fccc
Update test_pruning.py to use `devices` instead of `gpus` or `ipus` (#11341) 2022-01-09 17:15:29 +09:00
Jv Kyle Eclarin b56d8677ad
Update test_pruning.py to use `devices` instead of `gpus` or `ipus` (#11339) 2022-01-08 17:24:29 +01:00
Jv Kyle Eclarin 4710a8128b
Update test_gpu_stats_monitor.py to use `devices` instead of `gpus` or `ipus` (#11340) 2022-01-08 01:38:25 +00:00
Carlos Mocholí 59a7ba7605
Move `epoch_{start,end}` hooks from `TrainingEpochLoop` to `FitLoop` (#11201) 2022-01-06 15:13:18 +00:00
Carlos Mocholí 8a549a550c
Integrate progress tracking into the progress bar (#11213) 2022-01-06 14:29:48 +01:00
Adrian Wälchli 9c8f52ccd1
Fix restoring lr scheduler states with deepspeed strategy (#11322)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-01-06 12:34:16 +00:00
Carlos Mocholí 5693a94c32
Extend the deprecation of `Trainer(resume_from_checkpoint)` (#11334) 2022-01-06 13:18:37 +01:00
Kaushik B e15579a4f3
Rename `_distrib_type` to `_strategy_type` (#11328)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-06 06:32:50 +00:00
Abhishek Saroha 43c140c8e5
Fix frozen dataclass instance error in `apply_to_collection` (#10927)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-05 23:29:03 +01:00
Danielle Pintz 5b59c951e2
Deprecate `TrainerDataLoadingMixin` and move logic to `DataConnector` (#11282)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-05 21:23:57 +01:00
Carlos Mocholí dcffca73d4
Parametrize deepspeed hook test (#11308) 2022-01-05 19:38:25 +00:00
Kaushik B 70c975a9f3
Fix exception message for FSDP running on CPU (#11325) 2022-01-05 18:02:31 +01:00
Andrew Tritt dbf1acd5a5
Modify LSFEnvironment to use more reliable environment variable (#10825)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-05 12:45:25 +00:00
Kaushik B 93223ff5ce
Introduce StrategyRegistry (#11233)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-05 17:14:18 +05:30
Carlos Mocholí 5ac129e95a
Rename ttp -> strategy (#11312) 2022-01-05 12:12:25 +01:00
Carlos Mocholí 33c3490685
Fix min/max logging default value (#11310) 2022-01-05 11:42:03 +01:00
Adrian Wälchli a8bd7ac73f
Fix lr scheduler state not being dumped to checkpoint in deepspeed strategy (#11307) 2022-01-05 08:38:08 +00:00
Rohit Gupta 7eab379da2
Raise a warning if evaulation is triggered with best ckpt in case of multiple checkpoint callbacks (#11274)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-04 17:22:32 +00:00
Kaushik B 650c710efa
Rename training plugin test files & names to strategy (#11303) 2022-01-04 14:32:45 +01:00
Carlos Mocholí e9009d6058
Reset the total fit-validation batch progress on epoch (#11244) 2022-01-04 12:04:20 +01:00
Danielle Pintz 7fa1aebcc9
Remove `profile("training_step_and_backward")` (#11222) 2022-01-04 11:50:11 +01:00
Rohit Gupta 997da52f73
Update logic to make sure logged_metrics always contain tensors (#11270) 2022-01-04 10:32:44 +00:00
Rohit Gupta 98ea79b8b0
Add `opt_idx` to scheduler config if not assigned by user (#11247)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-04 14:57:15 +09:00
ananthsub 05ed9a201c
Group metrics generated by `DeviceStatsMonitor` for better visualization (#11254)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-03 13:26:17 +00:00
Adrian Wälchli 4eede7c30b
Add deprecation path for renamed training type plugins (#11227)
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-03 13:41:05 +01:00
jjenniferdai 4b5761539e
Remove `hpc_save` (#11101) 2022-01-03 12:23:13 +00:00
Aki Nitta 7637550ab5
Revert "[CI] Comment flaky tests (#10084)" (#10580)
* Revert "[CI] Comment flaky tests (#10084)"

This reverts commit ed9802643c.
2022-01-03 12:45:41 +01:00
Adam Viola 1fc046cde2
Fix `_should_reload_dl_epoch` causing inconsistent validation dataloader reloading (#11036)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-28 02:20:57 +01:00
Danielle Pintz ba6a8ddcad
refactor _configure_schedulers (#11245) 2021-12-23 10:03:28 -08:00
Carlos Mocholí 30236c837f
Reset the progress tracking state after sanity checking (#11218) 2021-12-23 16:36:03 +00:00
Kaushik B 0adcd6a048
Rename training_type_plugin file to strategy (#11239)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 14:01:23 +00:00
Adrian Wälchli c210e338ef
Update strategy import statements (#11231) 2021-12-23 08:26:28 +01:00
Danielle Pintz a6a28e08d2
Deprecate `TrainerOptimizersMixin` and move functionality to `core/optimizer.py` (#11155) 2021-12-22 17:56:37 -08:00
four4fish 81301dbba7
Rename `AcceleratorConnector.training_type_plugin` to `AcceleratorConnector.strategy` (#11212)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 01:36:23 +00:00
twsl 0b9034baef
Return only unique names/versions for LoggerCollection (#10976)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-23 00:35:38 +00:00
Kaushik B 576a5d62a0
Introduce strategies directory for Training Strategies (#11226)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 20:23:30 +00:00
Carlos Mocholí 85304d4672
Update pre-commit hook versions (#11202) 2021-12-22 17:09:27 +00:00
Carlos Mocholí eb5b350f9a
Remove explicit isinstance checks in strategies for checkpoint io (#11177)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 04:41:45 +00:00
Adrian Wälchli ba8e7cd787
Fix BF16 teardown for TPU precision plugin (#10990)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-22 03:47:14 +00:00
four4fish cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy (#11141)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
Adrian Wälchli 17ad1a4c00
Rename `ParallelPlugin` to `ParallelStrategy` (#11123) 2021-12-22 01:09:17 +00:00
four4fish 4bfe5bda0f
Rename the DDPSpawnShardedPlugin to DDPSpawnShardeedStrategy (#11210)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 00:27:36 +00:00
Aki Nitta 28ce9105e4
Rename `SingleDevicePlugin` to `SingleDeviceStrategy` (#11181)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 23:56:14 +00:00
four4fish f98cd78e9e
Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` (#11145) 2021-12-21 23:06:14 +00:00
four4fish 0c69c757d4
Rename the `DataParallelPlugin` to `DataParallelStrategy` (#11183) 2021-12-21 22:00:24 +00:00
Aki Nitta c3cd4d050f
Rename `SingleTPUPlugin` to `SingleTPUStrategy` (#11182) 2021-12-21 20:09:30 +00:00
four4fish 1c5a5c3dfe
Renamed the DDP2Plugin to DDP2Strategy (#11185)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 19:21:00 +00:00
Danielle Pintz ac8dc2c2f3
Deprecate `TrainerCallbackHookMixin` (#11148) 2021-12-21 09:47:08 -08:00
four4fish caab69aabb
Renamed DDPShardPlugin to DDPShardStrategy (#11187)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 17:18:25 +00:00
Carlos Mocholí f696326060
Remove `should_rank_save_checkpoint` property from TTP (#11070) 2021-12-21 18:11:20 +01:00
Carlos Mocholí 3692eba807
Drop Python 3.6 support (#11117) 2021-12-21 17:06:15 +00:00
Aki Nitta 9da78a94bd
Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` (#11190)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 16:36:16 +00:00
Kaushik B 2e947a88e0
Rename IPUPlugin to IPUStrategy (#11193) 2021-12-21 15:55:41 +00:00
Kaushik B 283bdece0a
Rename DeepSpeedPlugin to DeepSpeedStrategy (#11194) 2021-12-21 15:18:01 +00:00
four4fish b64dea9dc3
Rename `DDPPlugin` to `DDPStrategy` (#11142)
* Raname DDPPlugin to DDPStrategy

* Change ddp_plugin to ddp_strategy

* update changelog

* rename occurences in docs

* rename more occurrences

* fix line too long

* more fixes

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 08:55:51 +00:00
jjenniferdai 31f39c9578
Move `CheckpointConnector.fault_tolerant_auto_save_path` out of `CheckpointConnector.hpc_resume_path` (#11092) 2021-12-21 02:24:01 +01:00
Carlos Mocholí 9826de2162
Delete legacy multinode tests (#11175) 2021-12-20 20:01:57 +01:00
Adrian Wälchli 08e661ff72
Rename `restore_checkpoint_after_pre_dispatch` to `restore_checkpoint_after_setup` (#11166)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:16:52 +00:00
Carlos Mocholí e8169bbd46
Fix setter usage for checkpoint io and precision in TTP (#11071)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-20 17:45:32 +01:00
Adrian Wälchli f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic (#11137) 2021-12-20 17:41:22 +01:00
four4fish 0ee78e96ef
Rename `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` (#11143)
* Rename DDPFullyShardedPlugin to DDPFullyShardedStrategy

* update fsdp_plugin to fsdp_strategy

* update changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:11:20 +01:00
ORippler 86a3c5e2a3
Add required states for resumed ModelCheckpoint GC (#10995)
* Add required states for resumed ModelCheckpoint GC

* Add backwards compatibility with legacy cktps

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Add test to check if attrs are written to ckpt

Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk

* Test if attributes are restored properly from ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix broken `test_callbacks_state_fit_ckpt_path`

`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`

Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`

* Update test_restore.py

* Update test_restore.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check that all attributes are restored properly

* revert changes, use fix on master

* Convert to proper unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor `test_mode_checkpoint_saveload_ckpt`

* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Danielle Pintz b1baf460d9
Include hook's object name when profiling (#11026) 2021-12-20 15:18:24 +01:00
Adrian Wälchli 29eb9cccf2
Rename the `TrainingTypePlugin` base to `Strategy` (#11120)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
2021-12-20 12:50:11 +00:00
Carlos Mocholí 7ed3dbf191
Fix evaluation logging on epoch end with multiple dataloaders (#11132) 2021-12-19 15:51:01 +01:00
Rohit Gupta 61eb6230c2
Prune EvalModelTemplate (#11153) 2021-12-19 13:08:43 +00:00
Adrian Wälchli a3e2ef2be0
Refactor plugin tests whose assertions don't need to run in `on_fit_start` hook (#11149) 2021-12-18 23:38:40 +01:00
Rohit Gupta 3461af0ddb
Add support for returning callback from `LightningModule.configure_callbacks` (#11060) 2021-12-18 10:46:35 +00:00
Kaushik B 2a5d05b562
Fix tpu spawn plugin test (#11131) 2021-12-18 02:53:37 +00:00
Rafał Jankowski 3cc69f992b
Fixed NeptuneLogger when using DDP (#11030)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-18 01:40:13 +00:00
Carlos Mocholí 62f1e82e03
Fix CVE-2020-1747 and CVE-2020-14343 (#11099) 2021-12-17 20:27:15 +00:00