Boris Dayma
1e36cffbca
feat(wandb): support distributed modes ( #11650 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-02-09 19:53:21 +01:00
Carlos Mocholí
8394770d4a
Move data fetcher ownership to the loops ( #11621 )
2022-02-09 20:04:24 +05:30
Biho-Kim
24de29974c
bug fix #10872 ( #10965 )
...
Co-authored-by: louie.kim <louie.kim@kakaocorp.comlouie.kim@kakaocorp.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-02-09 14:15:49 +00:00
Carlos Mocholí
8822117200
Return the output of the optimizer step ( #11711 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-02-09 09:37:13 +00:00
Danielle Pintz
9e63281a4c
remove todos ( #11804 )
2022-02-09 08:30:27 +00:00
ananthsub
9d4de3a863
Faster callback configuration validator checks ( #11785 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-02-09 08:24:14 +00:00
Rohit Gupta
182c18d319
Configure native deepspeed schedulers with interval='step' ( #11788 )
2022-02-09 08:20:50 +00:00
jjenniferdai
1203094a20
Introduce `Stateful` DataModule ( #11637 )
2022-02-07 21:13:24 +01:00
circlecrystal
43a89eb132
bug fix: restore_optimizers correctly handles non-mapping values in optimizer.state.values() ( #11757 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-07 14:55:06 +00:00
Rohit Gupta
9ed44dee0d
Fix to avoid moving batch to device for DataParallel ( #11780 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2022-02-07 14:26:18 +00:00
Rohit Gupta
581bf7f2f2
Deprecate `on_epoch_start/on_epoch_end` hook ( #11578 )
2022-02-07 14:15:27 +00:00
ananthsub
bbf27ed09a
Use fsspec in checkpoint connector for fault-tolerant training ( #11776 )
2022-02-07 13:29:41 +01:00
ananthsub
0ba25d3cac
Update DDPStrategy to use optimizers property from within class ( #11777 )
2022-02-07 13:28:37 +01:00
Rohit Gupta
7ec1e66e17
reduce only loss with dp ( #11594 )
...
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-07 17:00:29 +05:30
Krishna Kalyan
f509e40ae3
Deprecate `on_before_accelerator_backend_setup` callback hook ( #11655 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-02-07 11:07:21 +00:00
ananthsub
a64438c897
Centralize rank_zero_only utilities into their own module ( #11747 )
...
* Centralize rank_zero_only utilities into their own module
Fixes #11746
* PossibleUserWarning
* Update test_warnings.py
* update imports
* more imports
* Update CHANGELOG.md
* Update mlflow.py
* Update cli.py
* Update api_references.rst
* Update meta.py
* add deprecation tests
* debug standalone
* fix standalone tests
* Update CHANGELOG.md
2022-02-07 08:09:55 +00:00
Danielle Pintz
34c454c756
Small improvements to TB and CSV loggers ( #11764 )
...
* small improvements to TB and CSV loggers
* addr comments
* remove redundant lines and update tests
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-02-07 14:59:39 +09:00
ananthsub
7900aabe62
Keep `is_global_zero` definitions in sync across strategy and trainer ( #11761 )
2022-02-07 01:33:32 +05:30
ananthsub
dfda970572
Update TPU Spawn to use root_device instead of LightningModule's device ( #11750 )
2022-02-06 06:26:38 +00:00
Dan Dale
9d8faecdb2
Allow Horovod `teardown()` to complete gracefully if exception thrown in callback setup ( #11752 )
2022-02-05 11:13:21 -08:00
ananthsub
819a747031
Use `root_device` in XLAStatsMonitor callback ( #11749 )
2022-02-05 10:09:08 -08:00
ananthsub
7d9454a3e9
Use `root_device` in DeviceStatsMonitor callback ( #11748 )
...
* Use trainer.strategy.root_device in favor of LightningModule.device in DeviceStatsMonitor
Minor refactor to use the strategy's own `root_device` instead of the LightningModule's device property.
Attempts at manual model parallelization by extending this plugin will face difficulties with the assumption that the LightningModule has all of its parameters on the same device.
For those use cases, it is critical to remove the assumption that the module has a device property (device in general goes against PyTorch module's design principles:
- https://github.com/pytorch/pytorch/issues/7460
- https://github.com/PyTorchLightning/pytorch-lightning/pull/1790#discussion_r423459412
2022-02-05 11:20:15 +01:00
ananthsub
241c97e6eb
Update HorovodStrategy to use optimizers property from within class ( #11728 )
2022-02-05 10:04:55 +01:00
Adrian Wälchli
cc43d07db1
Remove legacy dead code in DDP script launch ( #11678 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-05 11:40:16 +05:30
Dan Dale
3bc2407239
Allow access to ckpt_path within context of fit() ( #11696 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-05 05:23:16 +01:00
Carlos Mocholí
7da931d1ca
Support no pre-fetching ( #11606 )
2022-02-05 03:59:46 +00:00
Danielle Pintz
c71a1d7ea2
Remove `self._log_dir` from `BaseProfiler` ( #11740 )
2022-02-05 04:45:48 +01:00
ananthsub
72db64d294
Use the strategy's `root_device` instead of the LightningModule's device property ( #11734 )
2022-02-05 04:33:25 +01:00
Andres Algaba
58324b5197
Improve the result printing at the end of evaluation ( #11332 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-02-05 03:03:22 +01:00
NathanGodey
8a1b1eeef8
WandbLogger's log_image can use step argument ( #11716 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-05 01:02:41 +00:00
wangraying
8c07d8bf90
Add `Trainer(strategy="bagua")` ( #11146 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-02-04 17:02:09 +00:00
ananthsub
2eca957b29
Minor refactors to `init_dist_connection` ( #11733 )
2022-02-04 13:33:49 +01:00
Rohit Gupta
4d72110b51
Deprecate `on_batch_start/on_batch_end` callback hooks ( #11577 )
2022-02-03 19:51:56 +00:00
Rohit Gupta
400201712f
added warning for distributedsampler in case of evaluation ( #11479 )
2022-02-03 18:42:13 +00:00
Rohit Gupta
01abe72278
Fix to avoid val progress bar disappear after validate ( #11700 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 13:35:38 +00:00
Rohit Gupta
e9065e9d42
Fix rich with uneven refresh rate tracking ( #11668 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 10:27:05 +00:00
Rohit Gupta
7948ed703d
Avoid enforcing `shuffle=False` for eval dataloaders ( #11575 )
2022-02-03 09:35:31 +00:00
Danielle Pintz
9ebd7df22a
Move progress bar disabling out of the Trainer ( #11377 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-02-03 06:29:32 +00:00
Rohit Gupta
0cb64fb8ba
Fix mid-epoch warning call while resuming ( #11556 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-02-03 05:42:31 +00:00
four4fish
d43fd0d4d6
Lazy initialize Strategy.parallel_devices ( #11572 )
...
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 04:25:16 +00:00
Rohit Gupta
eceefdc602
Fix rich progress bar render only on main pbar ( #11690 )
2022-02-03 04:18:07 +00:00
Krishna Kalyan
6291af5c19
Replace occurrences of `on_before_accelerator_backend_setup_called` with `setup` ( #11568 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-03 04:14:33 +00:00
Peter Franek
ed8a5dadce
Improving instructions in finetuning docstring ( #10484 )
2022-02-03 04:13:06 +00:00
Anton Schwaighofer
f935319622
Allow a `CombinedLoader` as the training data in DDP ( #11648 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-02-03 04:01:20 +00:00
Sebastian Raschka
0e17f16438
Clarify what the default values for log are based on hooks ( #11611 )
2022-02-03 03:55:42 +00:00
Jirka Borovec
c5de105276
fix available modules ( #11526 )
2022-02-03 03:38:16 +00:00
Sebastian Raschka
9934569373
Fix typo in `TensorBoardLogger.log_metrics` error message ( #11595 )
2022-02-03 03:18:54 +00:00
Carlos Mocholí
3d3172d3da
[CLI] Support shorthand for loggers ( #11533 )
2022-02-03 02:58:14 +00:00
Bhadresh Savani
0ea48416cd
Removed subsection in `LightningDataModule` ( #11675 )
2022-02-03 02:53:43 +00:00
DuYicong515
0816a1997e
Add typing for utilities/memory.py ( #11545 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-03 02:34:05 +00:00
Piyush Hirapara
72f0e5bfae
Deprecate `on_configure_sharded_model` callback hook for v1.6 ( #11627 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-03 02:29:26 +00:00
Krishna Kalyan
6586dd23b7
Mark `CheckpointConnector` as protected ( #11550 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 02:26:08 +00:00
DuYicong515
06e2635c71
Refactor get_filesystem to use native fsspec API ( #11708 )
2022-02-03 01:55:24 +00:00
Akash Kwatra
d5aa7717aa
Remove experiment property from abstract class ( #11603 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 01:51:34 +00:00
Rohit Gupta
ee049e123d
Fix rich progress bar metric render on epoch end ( #11689 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-02-03 01:43:48 +00:00
jjenniferdai
ec1379da2c
Rename `_SupportsStateDict` --> `_Stateful` Protocol ( #11469 )
2022-02-02 23:45:59 +01:00
Carlos Mocholí
b8e360dafa
[CLI] Fix bug that forces overriding `configure_optimizers` ( #11672 )
2022-02-02 22:44:00 +00:00
Akash Kwatra
115a5d08e8
Decouple utilities from `LightningLoggerBase` ( #11484 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-02-02 23:29:01 +01:00
Aki Nitta
fbc1f9f1d9
Rename `Strategy.lr_schedulers` to `Strategy.lr_scheduler_configs` ( #11549 )
2022-02-02 22:10:01 +00:00
Nithin Rao
b8d2c65a37
Set the state before saving "last" or "none" checkpoints ( #11481 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-02 23:07:05 +01:00
Carlos Mocholí
d7944a13cd
Teardown all internal components on exception ( #11620 )
2022-02-02 21:10:19 +00:00
Rohit Gupta
3eee8f18cf
Sort simple profiler summary based on mean duration ( #11671 )
2022-02-02 20:44:42 +00:00
Rohit Gupta
76175217e4
Fix val_loop run on restart ( #11552 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-02 20:19:34 +00:00
Carlos Mocholí
a44881cd90
Changes in preparation to #8578 ( #11562 )
2022-02-02 19:57:08 +00:00
Carlos Mocholí
79a3ff690b
Add typing to data fetching ( #11515 )
2022-02-02 20:53:50 +01:00
Chunyang Wen
fe34bf2a65
Remove useless pass and abc ( #11522 )
2022-01-24 08:19:57 +00:00
Chunyang Wen
350c88e621
Let Accelerator inherit from ABC to make sure abstractmethod takes effect ( #11521 )
2022-01-23 20:47:43 +01:00
Carlos Mocholí
623dc974f5
Construct the hook kwargs inside each loop ( #11511 )
2022-01-22 15:57:12 +00:00
Carlos Mocholí
5ad5ba54c0
Refactor fetching function ( #11516 )
2022-01-20 20:06:58 +01:00
Carlos Mocholí
075b8801c9
Fix checkpoint values when saving and resetting the tuner state ( #11518 )
2022-01-20 18:54:40 +00:00
Carlos Mocholí
7295457a7b
[CLI] Save only the configuration used ( #11532 )
2022-01-20 12:35:43 +00:00
Rafał Jankowski
e78d658c8d
Remove access to `_short_id` in NeptuneLogger ( #11517 )
2022-01-20 12:07:42 +00:00
Maaz Karim
16a04b29eb
Mark SignalConnector as protected ( #11513 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-20 08:39:59 +01:00
ananthsub
1bd6fc979e
Remove `Strategy.on_tpu` property ( #11536 )
2022-01-20 08:25:26 +01:00
ananthsub
f41d1e5e5e
Remove `Strategy.on_gpu` ( #11537 )
2022-01-19 21:27:12 +00:00
Rohit Gupta
f7f835fa0e
improve simple profiler output ( #11414 )
2022-01-18 19:58:34 +00:00
Carlos Mocholí
62818dbace
Use a dataclass as the scheduler config ( #11443 )
2022-01-18 20:23:32 +01:00
Carlos Mocholí
344ab1e0a5
Move the `lightning_optimizers` ownership to the `Strategy` ( #11444 )
2022-01-18 12:58:56 +01:00
Rohit Gupta
033dba1494
Disable attaching samplers when using `IterableDataset` ( #11507 )
2022-01-17 23:33:57 +01:00
Gautam R Gare
ef4677ae7b
Change the default `prog_bar=False` to `True` in `LightningModule.log_grad_norm` ( #11472 )
...
* Reset on_step flag to True in log_grad_norm
* updated change log
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-01-18 02:34:50 +09:00
Carlos Mocholí
9cf9ded73b
Simplify data fetching ( #11466 )
2022-01-17 14:46:55 +00:00
Rohit Gupta
cad604211b
update load_from_checkpoint docstrings ( #11467 )
2022-01-16 20:48:27 +00:00
Carlos Mocholí
18bbb39eef
Set `Loop.restarting` recursively ( #11442 )
...
* Set `Loop.restarting` recursively
* Docs
* CHANGELOG
* Update pytorch_lightning/loops/epoch/training_epoch_loop.py
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-01-14 19:25:23 +09:00
Rohit Gupta
9771e7dff6
Update introduction docs ( #11140 )
2022-01-13 21:11:43 +00:00
Carlos Mocholí
a80da35d5d
Fix compatibility with old checkpoints and fault-tolerance enabled ( #11439 )
2022-01-13 14:53:17 +01:00
Rohit Gupta
96a53382ac
Update utilities API references ( #11450 )
2022-01-13 13:22:58 +00:00
Carlos Mocholí
5914fb748f
Add typing to accelerators/gpu.py ( #11333 )
2022-01-12 19:44:51 +00:00
Rohit Gupta
00d1758bac
Update training tricks docs ( #11169 )
2022-01-12 16:26:03 +00:00
Carlos Mocholí
f5bbc2cf17
Avoid in-place ops during logging result updates ( #11401 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-12 09:09:36 +01:00
Rohit Gupta
221091afc4
move profiler docs ( #11431 )
2022-01-12 05:56:16 +00:00
Aki Nitta
8dc36c3745
Fix inconsistent exceptions raised with no `rich` installed ( #11360 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:55:51 +00:00
Rohit Gupta
82c8875f33
Add `LightningModule.lr_scheduler_step` ( #10249 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:53:49 +00:00
Carlos Mocholí
9771040621
Add typing to `TQDMProgressBar` ( #11369 )
2022-01-12 01:07:30 +00:00
edward-io
6107ce8e0d
Add DETAIL logs for batch use cases ( #11008 )
2022-01-12 01:22:48 +01:00
Rohit Gupta
06b8f82b8a
Update API references in doc ( #11357 )
2022-01-07 15:56:17 +01:00
Carlos Mocholí
59a7ba7605
Move `epoch_{start,end}` hooks from `TrainingEpochLoop` to `FitLoop` ( #11201 )
2022-01-06 15:13:18 +00:00
Danielle Pintz
57567edeab
Move newly added Trainer methods to be with other methods ( #11335 )
2022-01-06 14:10:21 +00:00
Kaushik B
42a1c72660
Add Accelerators section to Lightning docs ( #10755 )
2022-01-06 19:12:44 +05:30
Carlos Mocholí
8a549a550c
Integrate progress tracking into the progress bar ( #11213 )
2022-01-06 14:29:48 +01:00
Adrian Wälchli
3a2df4f75d
Fix typing in `pl.callbacks.xla_stats_monitor` ( #11219 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-06 12:51:02 +00:00
NathanGodey
9b873dcfcc
Changed hook doctstring ( #11345 )
2022-01-06 12:37:11 +00:00
Adrian Wälchli
9c8f52ccd1
Fix restoring lr scheduler states with deepspeed strategy ( #11322 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-01-06 12:34:16 +00:00
Carlos Mocholí
5693a94c32
Extend the deprecation of `Trainer(resume_from_checkpoint)` ( #11334 )
2022-01-06 13:18:37 +01:00
Kaushik B
e15579a4f3
Rename `_distrib_type` to `_strategy_type` ( #11328 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-06 06:32:50 +00:00
Abhishek Saroha
43c140c8e5
Fix frozen dataclass instance error in `apply_to_collection` ( #10927 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-05 23:29:03 +01:00
Danielle Pintz
5b59c951e2
Deprecate `TrainerDataLoadingMixin` and move logic to `DataConnector` ( #11282 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-05 21:23:57 +01:00
Carlos Mocholí
c0726bacbe
Update `LightningCLI(trainer_defaults=...)` doc ( #11309 )
...
Co-authored-by: Mauricio Villegas <mauricio_ville@yahoo.com>
2022-01-05 19:43:35 +00:00
Adrian Wälchli
9906a1a54d
Update optimizer configuration info message in `DeepSpeedStrategy` ( #11327 )
2022-01-05 18:20:06 +00:00
Carlos Mocholí
1b6f851880
Add typing to some utility files ( #11316 )
2022-01-05 17:14:22 +00:00
Kaushik B
70c975a9f3
Fix exception message for FSDP running on CPU ( #11325 )
2022-01-05 18:02:31 +01:00
Rohit Gupta
8955081aaf
Update precision docs ( #11010 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-01-05 14:58:04 +00:00
Andrew Tritt
dbf1acd5a5
Modify LSFEnvironment to use more reliable environment variable ( #10825 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-05 12:45:25 +00:00
Kaushik B
93223ff5ce
Introduce StrategyRegistry ( #11233 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-05 17:14:18 +05:30
Carlos Mocholí
5ac129e95a
Rename ttp -> strategy ( #11312 )
2022-01-05 12:12:25 +01:00
Carlos Mocholí
33c3490685
Fix min/max logging default value ( #11310 )
2022-01-05 11:42:03 +01:00
Adrian Wälchli
a8bd7ac73f
Fix lr scheduler state not being dumped to checkpoint in deepspeed strategy ( #11307 )
2022-01-05 08:38:08 +00:00
Rohit Gupta
7eab379da2
Raise a warning if evaulation is triggered with best ckpt in case of multiple checkpoint callbacks ( #11274 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-04 17:22:32 +00:00
Carlos Mocholí
a610e043d7
Add typing for utilities/enums.py ( #11298 )
2022-01-04 13:30:56 +01:00
Carlos Mocholí
e9009d6058
Reset the total fit-validation batch progress on epoch ( #11244 )
2022-01-04 12:04:20 +01:00
Danielle Pintz
7fa1aebcc9
Remove `profile("training_step_and_backward")` ( #11222 )
2022-01-04 11:50:11 +01:00
Rohit Gupta
997da52f73
Update logic to make sure logged_metrics always contain tensors ( #11270 )
2022-01-04 10:32:44 +00:00
Rohit Gupta
98ea79b8b0
Add `opt_idx` to scheduler config if not assigned by user ( #11247 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-01-04 14:57:15 +09:00
Ed Pizzi
cf32127e7e
Avoid non-blocking GPU->CPU copies. ( #11288 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-01-03 22:17:50 +00:00
Kaushik B
5a4df4ec7d
Update strategy import statements ( #11238 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-01-03 15:54:46 +00:00
ananthsub
05ed9a201c
Group metrics generated by `DeviceStatsMonitor` for better visualization ( #11254 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-03 13:26:17 +00:00
Adrian Wälchli
17cb3c70f7
Fix data fetcher selection ( #11294 )
2022-01-03 13:49:17 +01:00
Danielle Pintz
b082715103
Remove `Strategy.optimizer_zero_grad` ( #11246 )
2022-01-03 13:46:57 +01:00
Adrian Wälchli
4eede7c30b
Add deprecation path for renamed training type plugins ( #11227 )
...
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-03 13:41:05 +01:00
jjenniferdai
4b5761539e
Remove `hpc_save` ( #11101 )
2022-01-03 12:23:13 +00:00
Adam Viola
1fc046cde2
Fix `_should_reload_dl_epoch` causing inconsistent validation dataloader reloading ( #11036 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-28 02:20:57 +01:00
Danielle Pintz
ca9b25db80
Remove `Strategy.init_optimizers` ( #11236 )
2021-12-23 18:48:21 +00:00
Danielle Pintz
ba6a8ddcad
refactor _configure_schedulers ( #11245 )
2021-12-23 10:03:28 -08:00
Carlos Mocholí
f44b209e72
Fix CLI race condition saving the config ( #11199 )
2021-12-23 16:45:06 +00:00
Carlos Mocholí
30236c837f
Reset the progress tracking state after sanity checking ( #11218 )
2021-12-23 16:36:03 +00:00
Kaushik B
0adcd6a048
Rename training_type_plugin file to strategy ( #11239 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 14:01:23 +00:00
Adrian Wälchli
c210e338ef
Update strategy import statements ( #11231 )
2021-12-23 08:26:28 +01:00
Danielle Pintz
a6a28e08d2
Deprecate `TrainerOptimizersMixin` and move functionality to `core/optimizer.py` ( #11155 )
2021-12-22 17:56:37 -08:00
four4fish
81301dbba7
Rename `AcceleratorConnector.training_type_plugin` to `AcceleratorConnector.strategy` ( #11212 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 01:36:23 +00:00
twsl
0b9034baef
Return only unique names/versions for LoggerCollection ( #10976 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-23 00:35:38 +00:00
Kaushik B
576a5d62a0
Introduce strategies directory for Training Strategies ( #11226 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 20:23:30 +00:00
Carlos Mocholí
eb5b350f9a
Remove explicit isinstance checks in strategies for checkpoint io ( #11177 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 04:41:45 +00:00
Adrian Wälchli
b6dd1a3878
Fix typing in `pl.callbacks.lr_monitor` ( #10802 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-22 03:50:00 +00:00
Adrian Wälchli
ba8e7cd787
Fix BF16 teardown for TPU precision plugin ( #10990 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-22 03:47:14 +00:00
four4fish
cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy ( #11141 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
Adrian Wälchli
17ad1a4c00
Rename `ParallelPlugin` to `ParallelStrategy` ( #11123 )
2021-12-22 01:09:17 +00:00
four4fish
4bfe5bda0f
Rename the DDPSpawnShardedPlugin to DDPSpawnShardeedStrategy ( #11210 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 00:27:36 +00:00
Aki Nitta
28ce9105e4
Rename `SingleDevicePlugin` to `SingleDeviceStrategy` ( #11181 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 23:56:14 +00:00
four4fish
f98cd78e9e
Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` ( #11145 )
2021-12-21 23:06:14 +00:00
four4fish
0c69c757d4
Rename the `DataParallelPlugin` to `DataParallelStrategy` ( #11183 )
2021-12-21 22:00:24 +00:00
Aki Nitta
c3cd4d050f
Rename `SingleTPUPlugin` to `SingleTPUStrategy` ( #11182 )
2021-12-21 20:09:30 +00:00
four4fish
1c5a5c3dfe
Renamed the DDP2Plugin to DDP2Strategy ( #11185 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 19:21:00 +00:00
Carlos Mocholí
b2c3d01b3e
Fix master import conflict ( #11203 )
2021-12-21 18:47:56 +00:00
Danielle Pintz
ac8dc2c2f3
Deprecate `TrainerCallbackHookMixin` ( #11148 )
2021-12-21 09:47:08 -08:00
four4fish
caab69aabb
Renamed DDPShardPlugin to DDPShardStrategy ( #11187 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 17:18:25 +00:00
Carlos Mocholí
f696326060
Remove `should_rank_save_checkpoint` property from TTP ( #11070 )
2021-12-21 18:11:20 +01:00
Carlos Mocholí
3692eba807
Drop Python 3.6 support ( #11117 )
2021-12-21 17:06:15 +00:00
Aki Nitta
9da78a94bd
Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` ( #11190 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 16:36:16 +00:00
Danielle Pintz
1177389d5a
Move `TrainerCallbackHookMixin.on_save/load_checkpoint` to `Trainer` and rename for clarity ( #11179 )
2021-12-21 17:30:01 +01:00
Kaushik B
2e947a88e0
Rename IPUPlugin to IPUStrategy ( #11193 )
2021-12-21 15:55:41 +00:00
Kaushik B
283bdece0a
Rename DeepSpeedPlugin to DeepSpeedStrategy ( #11194 )
2021-12-21 15:18:01 +00:00
Oliver Borchert
17aceafa80
Suppress Warning in `PredictionEpochLoop` ( #11189 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-21 14:40:41 +00:00
Kaushik B
ba0c901395
Rename HorovodPlugin to HorovodStrategy ( #11195 )
2021-12-21 14:31:41 +01:00
Rohit Gupta
93ce2d7cc9
Avoid torch amp cuda warning with bf16 on cpu ( #11161 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 18:24:26 +05:30
four4fish
b64dea9dc3
Rename `DDPPlugin` to `DDPStrategy` ( #11142 )
...
* Raname DDPPlugin to DDPStrategy
* Change ddp_plugin to ddp_strategy
* update changelog
* rename occurences in docs
* rename more occurrences
* fix line too long
* more fixes
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 08:55:51 +00:00
jjenniferdai
31f39c9578
Move `CheckpointConnector.fault_tolerant_auto_save_path` out of `CheckpointConnector.hpc_resume_path` ( #11092 )
2021-12-21 02:24:01 +01:00
Rohit Gupta
787f41eff6
update optimizer_step example in docs ( #10420 )
2021-12-21 08:19:40 +09:00
Adrian Wälchli
08e661ff72
Rename `restore_checkpoint_after_pre_dispatch` to `restore_checkpoint_after_setup` ( #11166 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:16:52 +00:00
Carlos Mocholí
e8169bbd46
Fix setter usage for checkpoint io and precision in TTP ( #11071 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-20 17:45:32 +01:00
Adrian Wälchli
f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic ( #11137 )
2021-12-20 17:41:22 +01:00
Adrian Wälchli
2e47e2f4ae
Set spawn_method on initialization ( #11162 )
2021-12-20 17:39:54 +01:00
four4fish
0ee78e96ef
Rename `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` ( #11143 )
...
* Rename DDPFullyShardedPlugin to DDPFullyShardedStrategy
* update fsdp_plugin to fsdp_strategy
* update changelog
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:11:20 +01:00
ORippler
86a3c5e2a3
Add required states for resumed ModelCheckpoint GC ( #10995 )
...
* Add required states for resumed ModelCheckpoint GC
* Add backwards compatibility with legacy cktps
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Add test to check if attrs are written to ckpt
Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk
* Test if attributes are restored properly from ckpt
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix broken `test_callbacks_state_fit_ckpt_path`
`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`
Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`
* Update test_restore.py
* Update test_restore.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Check that all attributes are restored properly
* revert changes, use fix on master
* Convert to proper unit test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor `test_mode_checkpoint_saveload_ckpt`
* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Danielle Pintz
b1baf460d9
Include hook's object name when profiling ( #11026 )
2021-12-20 15:18:24 +01:00
Adrian Wälchli
29eb9cccf2
Rename the `TrainingTypePlugin` base to `Strategy` ( #11120 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
2021-12-20 12:50:11 +00:00
guyang3532
cc4a978bf6
Safely disable profiler ( #11167 )
2021-12-20 11:51:46 +00:00
Carlos Mocholí
7ed3dbf191
Fix evaluation logging on epoch end with multiple dataloaders ( #11132 )
2021-12-19 15:51:01 +01:00
Danielle Pintz
f95976d602
rename _call_ttp_hook to _call_strategy_hook ( #11150 )
2021-12-18 17:53:03 -08:00
Rohit Gupta
3461af0ddb
Add support for returning callback from `LightningModule.configure_callbacks` ( #11060 )
2021-12-18 10:46:35 +00:00
Rafał Jankowski
3cc69f992b
Fixed NeptuneLogger when using DDP ( #11030 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-18 01:40:13 +00:00
Carlos Mocholí
62f1e82e03
Fix CVE-2020-1747 and CVE-2020-14343 ( #11099 )
2021-12-17 20:27:15 +00:00
Carlos Mocholí
8508cce37d
Mark all result classes as protected ( #11130 )
2021-12-17 19:35:17 +00:00
Rohit Gupta
860959fb3f
Enable logging hparams only if there are any ( #11105 )
2021-12-17 19:40:56 +01:00
Carlos Mocholí
dbb7f56b35
Deprecate `Trainer.verbose_evaluate` ( #10931 )
2021-12-17 19:26:32 +01:00
Carlos Mocholí
75d96d9897
Reset the current progress tracking state during double evaluation ( #11119 )
2021-12-17 19:20:11 +01:00
Adrian Wälchli
978f5e6ad6
Fix AttributeError when using CombinedLoader in prediction ( #11111 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-17 18:02:25 +00:00
quancs
179b4dd415
remove redundant methods in RichProgressBar ( #11100 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-17 17:40:31 +00:00
Carlos Mocholí
7e10f6d41f
Save the loop progress state by default ( #10784 )
2021-12-17 16:00:27 +00:00
Carlos Mocholí
fa6d17c96f
Fix typing for utilities.warnings ( #11115 )
2021-12-17 15:07:27 +01:00
Adrian Wälchli
6582249a0c
Fix signal teardown outside main thread ( #11124 )
2021-12-17 14:12:02 +01:00
Carlos Mocholí
5956a0716b
Track the evaluation loop outputs in the loop ( #10928 )
2021-12-17 14:00:47 +01:00
Adrian Wälchli
210ff845c1
Mark `Trainer.run_stage` as protected ( #11000 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-17 13:46:03 +01:00
Sean Naren
c66cd12445
Remove partitioning of model in ZeRO 3 ( #10655 )
2021-12-17 12:36:53 +00:00
Carlos Mocholí
4415677994
Add typing for `trainer.logger` ( #11114 )
2021-12-17 13:34:18 +01:00
Carlos Mocholí
5932f52b2f
Avoid the deprecated `onnx.export(example_outputs=...)` in torch 1.10 ( #11116 )
2021-12-17 10:11:11 +01:00
Adrian Wälchli
e19d93f69e
Initialize ModelCheckpoint state as early as possible ( #11108 )
2021-12-17 00:18:29 +01:00
Adrian Wälchli
262aefc8df
Remove obsolete `pre_dispatch` in `DDPSpawnShardedPlugin` ( #10988 )
2021-12-16 21:43:15 +01:00
Adrian Wälchli
2b0075a47e
Teardown sync-batchnorm after training ( #11078 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 18:58:44 +00:00
Carlos Mocholí
46d6fbf11b
Add `Loop.replace` ( #10324 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-16 17:41:38 +00:00
Adrian Wälchli
c335a7891d
Remove redundant special case for disabling the progress bar on TPU ( #11061 )
2021-12-16 18:02:50 +01:00
Carlos Mocholí
f37bd4677d
Update mypy ( #11096 )
2021-12-16 17:53:12 +01:00