Carlos Mocholí
5ad5ba54c0
Refactor fetching function ( #11516 )
2022-01-20 20:06:58 +01:00
Carlos Mocholí
075b8801c9
Fix checkpoint values when saving and resetting the tuner state ( #11518 )
2022-01-20 18:54:40 +00:00
Aki Nitta
3c3ba39a06
Tests: Fail on FutureWarning ( #11541 )
2022-01-20 12:52:34 +00:00
Carlos Mocholí
7295457a7b
[CLI] Save only the configuration used ( #11532 )
2022-01-20 12:35:43 +00:00
Rafał Jankowski
e78d658c8d
Remove access to `_short_id` in NeptuneLogger ( #11517 )
2022-01-20 12:07:42 +00:00
Maaz Karim
16a04b29eb
Mark SignalConnector as protected ( #11513 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-20 08:39:59 +01:00
ananthsub
1bd6fc979e
Remove `Strategy.on_tpu` property ( #11536 )
2022-01-20 08:25:26 +01:00
ananthsub
f41d1e5e5e
Remove `Strategy.on_gpu` ( #11537 )
2022-01-19 21:27:12 +00:00
Rohit Gupta
f7f835fa0e
improve simple profiler output ( #11414 )
2022-01-18 19:58:34 +00:00
Carlos Mocholí
62818dbace
Use a dataclass as the scheduler config ( #11443 )
2022-01-18 20:23:32 +01:00
Mauricio Villegas
6397ac4ffd
[CLI] Add unit test with a model that has a parameter with `lazy_instance` default. ( #11509 )
2022-01-18 16:36:54 +01:00
Jv Kyle Eclarin
c85946531d
Update `tests/callbacks/*.py` to use `devices` instead of `gpus` or `ipus` ( #11387 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-01-18 19:13:01 +05:30
Carlos Mocholí
344ab1e0a5
Move the `lightning_optimizers` ownership to the `Strategy` ( #11444 )
2022-01-18 12:58:56 +01:00
Rohit Gupta
033dba1494
Disable attaching samplers when using `IterableDataset` ( #11507 )
2022-01-17 23:33:57 +01:00
Carlos Mocholí
9cf9ded73b
Simplify data fetching ( #11466 )
2022-01-17 14:46:55 +00:00
Jv Kyle Eclarin
2012816645
Update `tests/core/*.py` to use `devices` instead of `gpus` or `ipus` ( #11433 )
...
* update tests for v2
* Update
* Pass devices to kwargs
* add accelerator to kwargs
* Fix testing with cpu on GPU env
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-01-17 18:15:02 +09:00
Jv Kyle Eclarin
3dee2759ee
update tests for v2 ( #11487 )
2022-01-16 21:43:37 +05:30
Jv Kyle Eclarin
5dc8002d46
update tests for v2 ( #11486 )
2022-01-16 21:43:07 +05:30
Jv Kyle Eclarin
f97359a8c2
update tests for v2 ( #11485 )
2022-01-16 21:42:18 +05:30
Carlos Mocholí
18bbb39eef
Set `Loop.restarting` recursively ( #11442 )
...
* Set `Loop.restarting` recursively
* Docs
* CHANGELOG
* Update pytorch_lightning/loops/epoch/training_epoch_loop.py
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-01-14 19:25:23 +09:00
Jv Kyle Eclarin
36d42aaa88
Update `tests/utilities/*.py` to use `devices` instead of `gpus` or `ipus` ( #11458 )
2022-01-13 14:52:54 +01:00
Carlos Mocholí
5914fb748f
Add typing to accelerators/gpu.py ( #11333 )
2022-01-12 19:44:51 +00:00
Carlos Mocholí
f5bbc2cf17
Avoid in-place ops during logging result updates ( #11401 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-12 09:09:36 +01:00
Jv Kyle Eclarin
d2d284fd6e
Update `tests/checkpointing/*.py` to use `devices` instead of `gpus` or `ipus` ( #11408 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-12 05:47:01 +00:00
Aki Nitta
8dc36c3745
Fix inconsistent exceptions raised with no `rich` installed ( #11360 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:55:51 +00:00
Rohit Gupta
82c8875f33
Add `LightningModule.lr_scheduler_step` ( #10249 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:53:49 +00:00
Aki Nitta
ba7193721a
Fix `torch.distributed._*` import statements in tests ( #11416 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:53:14 +00:00
Carlos Mocholí
9771040621
Add typing to `TQDMProgressBar` ( #11369 )
2022-01-12 01:07:30 +00:00
edward-io
6107ce8e0d
Add DETAIL logs for batch use cases ( #11008 )
2022-01-12 01:22:48 +01:00
Jv Kyle Eclarin
3e0569fccc
Update test_pruning.py to use `devices` instead of `gpus` or `ipus` ( #11341 )
2022-01-09 17:15:29 +09:00
Jv Kyle Eclarin
b56d8677ad
Update test_pruning.py to use `devices` instead of `gpus` or `ipus` ( #11339 )
2022-01-08 17:24:29 +01:00
Jv Kyle Eclarin
4710a8128b
Update test_gpu_stats_monitor.py to use `devices` instead of `gpus` or `ipus` ( #11340 )
2022-01-08 01:38:25 +00:00
Carlos Mocholí
59a7ba7605
Move `epoch_{start,end}` hooks from `TrainingEpochLoop` to `FitLoop` ( #11201 )
2022-01-06 15:13:18 +00:00
Carlos Mocholí
8a549a550c
Integrate progress tracking into the progress bar ( #11213 )
2022-01-06 14:29:48 +01:00
Adrian Wälchli
9c8f52ccd1
Fix restoring lr scheduler states with deepspeed strategy ( #11322 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-01-06 12:34:16 +00:00
Carlos Mocholí
5693a94c32
Extend the deprecation of `Trainer(resume_from_checkpoint)` ( #11334 )
2022-01-06 13:18:37 +01:00
Kaushik B
e15579a4f3
Rename `_distrib_type` to `_strategy_type` ( #11328 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-06 06:32:50 +00:00
Abhishek Saroha
43c140c8e5
Fix frozen dataclass instance error in `apply_to_collection` ( #10927 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-05 23:29:03 +01:00
Danielle Pintz
5b59c951e2
Deprecate `TrainerDataLoadingMixin` and move logic to `DataConnector` ( #11282 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-05 21:23:57 +01:00
Carlos Mocholí
dcffca73d4
Parametrize deepspeed hook test ( #11308 )
2022-01-05 19:38:25 +00:00
Kaushik B
70c975a9f3
Fix exception message for FSDP running on CPU ( #11325 )
2022-01-05 18:02:31 +01:00
Andrew Tritt
dbf1acd5a5
Modify LSFEnvironment to use more reliable environment variable ( #10825 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-05 12:45:25 +00:00
Kaushik B
93223ff5ce
Introduce StrategyRegistry ( #11233 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-05 17:14:18 +05:30
Carlos Mocholí
5ac129e95a
Rename ttp -> strategy ( #11312 )
2022-01-05 12:12:25 +01:00
Carlos Mocholí
33c3490685
Fix min/max logging default value ( #11310 )
2022-01-05 11:42:03 +01:00
Adrian Wälchli
a8bd7ac73f
Fix lr scheduler state not being dumped to checkpoint in deepspeed strategy ( #11307 )
2022-01-05 08:38:08 +00:00
Rohit Gupta
7eab379da2
Raise a warning if evaulation is triggered with best ckpt in case of multiple checkpoint callbacks ( #11274 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-04 17:22:32 +00:00
Kaushik B
650c710efa
Rename training plugin test files & names to strategy ( #11303 )
2022-01-04 14:32:45 +01:00
Carlos Mocholí
e9009d6058
Reset the total fit-validation batch progress on epoch ( #11244 )
2022-01-04 12:04:20 +01:00
Danielle Pintz
7fa1aebcc9
Remove `profile("training_step_and_backward")` ( #11222 )
2022-01-04 11:50:11 +01:00
Rohit Gupta
997da52f73
Update logic to make sure logged_metrics always contain tensors ( #11270 )
2022-01-04 10:32:44 +00:00
Rohit Gupta
98ea79b8b0
Add `opt_idx` to scheduler config if not assigned by user ( #11247 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-04 14:57:15 +09:00
ananthsub
05ed9a201c
Group metrics generated by `DeviceStatsMonitor` for better visualization ( #11254 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-03 13:26:17 +00:00
Adrian Wälchli
4eede7c30b
Add deprecation path for renamed training type plugins ( #11227 )
...
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-03 13:41:05 +01:00
jjenniferdai
4b5761539e
Remove `hpc_save` ( #11101 )
2022-01-03 12:23:13 +00:00
Aki Nitta
7637550ab5
Revert "[CI] Comment flaky tests ( #10084 )" ( #10580 )
...
* Revert "[CI] Comment flaky tests (#10084 )"
This reverts commit ed9802643c
.
2022-01-03 12:45:41 +01:00
Adam Viola
1fc046cde2
Fix `_should_reload_dl_epoch` causing inconsistent validation dataloader reloading ( #11036 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-28 02:20:57 +01:00
Danielle Pintz
ba6a8ddcad
refactor _configure_schedulers ( #11245 )
2021-12-23 10:03:28 -08:00
Carlos Mocholí
30236c837f
Reset the progress tracking state after sanity checking ( #11218 )
2021-12-23 16:36:03 +00:00
Kaushik B
0adcd6a048
Rename training_type_plugin file to strategy ( #11239 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 14:01:23 +00:00
Adrian Wälchli
c210e338ef
Update strategy import statements ( #11231 )
2021-12-23 08:26:28 +01:00
Danielle Pintz
a6a28e08d2
Deprecate `TrainerOptimizersMixin` and move functionality to `core/optimizer.py` ( #11155 )
2021-12-22 17:56:37 -08:00
four4fish
81301dbba7
Rename `AcceleratorConnector.training_type_plugin` to `AcceleratorConnector.strategy` ( #11212 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 01:36:23 +00:00
twsl
0b9034baef
Return only unique names/versions for LoggerCollection ( #10976 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-23 00:35:38 +00:00
Kaushik B
576a5d62a0
Introduce strategies directory for Training Strategies ( #11226 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 20:23:30 +00:00
Carlos Mocholí
85304d4672
Update pre-commit hook versions ( #11202 )
2021-12-22 17:09:27 +00:00
Carlos Mocholí
eb5b350f9a
Remove explicit isinstance checks in strategies for checkpoint io ( #11177 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 04:41:45 +00:00
Adrian Wälchli
ba8e7cd787
Fix BF16 teardown for TPU precision plugin ( #10990 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-22 03:47:14 +00:00
four4fish
cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy ( #11141 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
Adrian Wälchli
17ad1a4c00
Rename `ParallelPlugin` to `ParallelStrategy` ( #11123 )
2021-12-22 01:09:17 +00:00
four4fish
4bfe5bda0f
Rename the DDPSpawnShardedPlugin to DDPSpawnShardeedStrategy ( #11210 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 00:27:36 +00:00
Aki Nitta
28ce9105e4
Rename `SingleDevicePlugin` to `SingleDeviceStrategy` ( #11181 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 23:56:14 +00:00
four4fish
f98cd78e9e
Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` ( #11145 )
2021-12-21 23:06:14 +00:00
four4fish
0c69c757d4
Rename the `DataParallelPlugin` to `DataParallelStrategy` ( #11183 )
2021-12-21 22:00:24 +00:00
Aki Nitta
c3cd4d050f
Rename `SingleTPUPlugin` to `SingleTPUStrategy` ( #11182 )
2021-12-21 20:09:30 +00:00
four4fish
1c5a5c3dfe
Renamed the DDP2Plugin to DDP2Strategy ( #11185 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 19:21:00 +00:00
Danielle Pintz
ac8dc2c2f3
Deprecate `TrainerCallbackHookMixin` ( #11148 )
2021-12-21 09:47:08 -08:00
four4fish
caab69aabb
Renamed DDPShardPlugin to DDPShardStrategy ( #11187 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 17:18:25 +00:00
Carlos Mocholí
f696326060
Remove `should_rank_save_checkpoint` property from TTP ( #11070 )
2021-12-21 18:11:20 +01:00
Carlos Mocholí
3692eba807
Drop Python 3.6 support ( #11117 )
2021-12-21 17:06:15 +00:00
Aki Nitta
9da78a94bd
Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` ( #11190 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 16:36:16 +00:00
Kaushik B
2e947a88e0
Rename IPUPlugin to IPUStrategy ( #11193 )
2021-12-21 15:55:41 +00:00
Kaushik B
283bdece0a
Rename DeepSpeedPlugin to DeepSpeedStrategy ( #11194 )
2021-12-21 15:18:01 +00:00
four4fish
b64dea9dc3
Rename `DDPPlugin` to `DDPStrategy` ( #11142 )
...
* Raname DDPPlugin to DDPStrategy
* Change ddp_plugin to ddp_strategy
* update changelog
* rename occurences in docs
* rename more occurrences
* fix line too long
* more fixes
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 08:55:51 +00:00
jjenniferdai
31f39c9578
Move `CheckpointConnector.fault_tolerant_auto_save_path` out of `CheckpointConnector.hpc_resume_path` ( #11092 )
2021-12-21 02:24:01 +01:00
Carlos Mocholí
9826de2162
Delete legacy multinode tests ( #11175 )
2021-12-20 20:01:57 +01:00
Adrian Wälchli
08e661ff72
Rename `restore_checkpoint_after_pre_dispatch` to `restore_checkpoint_after_setup` ( #11166 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:16:52 +00:00
Carlos Mocholí
e8169bbd46
Fix setter usage for checkpoint io and precision in TTP ( #11071 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-20 17:45:32 +01:00
Adrian Wälchli
f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic ( #11137 )
2021-12-20 17:41:22 +01:00
four4fish
0ee78e96ef
Rename `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` ( #11143 )
...
* Rename DDPFullyShardedPlugin to DDPFullyShardedStrategy
* update fsdp_plugin to fsdp_strategy
* update changelog
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:11:20 +01:00
ORippler
86a3c5e2a3
Add required states for resumed ModelCheckpoint GC ( #10995 )
...
* Add required states for resumed ModelCheckpoint GC
* Add backwards compatibility with legacy cktps
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Add test to check if attrs are written to ckpt
Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk
* Test if attributes are restored properly from ckpt
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix broken `test_callbacks_state_fit_ckpt_path`
`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`
Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`
* Update test_restore.py
* Update test_restore.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Check that all attributes are restored properly
* revert changes, use fix on master
* Convert to proper unit test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor `test_mode_checkpoint_saveload_ckpt`
* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Danielle Pintz
b1baf460d9
Include hook's object name when profiling ( #11026 )
2021-12-20 15:18:24 +01:00
Adrian Wälchli
29eb9cccf2
Rename the `TrainingTypePlugin` base to `Strategy` ( #11120 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
2021-12-20 12:50:11 +00:00
Carlos Mocholí
7ed3dbf191
Fix evaluation logging on epoch end with multiple dataloaders ( #11132 )
2021-12-19 15:51:01 +01:00
Rohit Gupta
61eb6230c2
Prune EvalModelTemplate ( #11153 )
2021-12-19 13:08:43 +00:00
Adrian Wälchli
a3e2ef2be0
Refactor plugin tests whose assertions don't need to run in `on_fit_start` hook ( #11149 )
2021-12-18 23:38:40 +01:00
Rohit Gupta
3461af0ddb
Add support for returning callback from `LightningModule.configure_callbacks` ( #11060 )
2021-12-18 10:46:35 +00:00
Kaushik B
2a5d05b562
Fix tpu spawn plugin test ( #11131 )
2021-12-18 02:53:37 +00:00
Rafał Jankowski
3cc69f992b
Fixed NeptuneLogger when using DDP ( #11030 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-18 01:40:13 +00:00
Carlos Mocholí
62f1e82e03
Fix CVE-2020-1747 and CVE-2020-14343 ( #11099 )
2021-12-17 20:27:15 +00:00