Commit Graph

434 Commits

Author SHA1 Message Date
Krishna Kalyan 6586dd23b7
Mark `CheckpointConnector` as protected (#11550)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 02:26:08 +00:00
Carlos Mocholí a44881cd90
Changes in preparation to #8578 (#11562) 2022-02-02 19:57:08 +00:00
Carlos Mocholí 62818dbace
Use a dataclass as the scheduler config (#11443) 2022-01-18 20:23:32 +01:00
Rohit Gupta 82c8875f33
Add `LightningModule.lr_scheduler_step` (#10249)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-01-12 03:53:49 +00:00
Carlos Mocholí dcffca73d4
Parametrize deepspeed hook test (#11308) 2022-01-05 19:38:25 +00:00
jjenniferdai 4b5761539e
Remove `hpc_save` (#11101) 2022-01-03 12:23:13 +00:00
Adam Viola 1fc046cde2
Fix `_should_reload_dl_epoch` causing inconsistent validation dataloader reloading (#11036)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-28 02:20:57 +01:00
Kaushik B 0adcd6a048
Rename training_type_plugin file to strategy (#11239)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-23 14:01:23 +00:00
Adrian Wälchli c210e338ef
Update strategy import statements (#11231) 2021-12-23 08:26:28 +01:00
Kaushik B 576a5d62a0
Introduce strategies directory for Training Strategies (#11226)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 20:23:30 +00:00
Carlos Mocholí 85304d4672
Update pre-commit hook versions (#11202) 2021-12-22 17:09:27 +00:00
Adrian Wälchli ba8e7cd787
Fix BF16 teardown for TPU precision plugin (#10990)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-22 03:47:14 +00:00
four4fish cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy (#11141)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
four4fish f98cd78e9e
Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` (#11145) 2021-12-21 23:06:14 +00:00
Aki Nitta 9da78a94bd
Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` (#11190)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 16:36:16 +00:00
Adrian Wälchli f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic (#11137) 2021-12-20 17:41:22 +01:00
ORippler 86a3c5e2a3
Add required states for resumed ModelCheckpoint GC (#10995)
* Add required states for resumed ModelCheckpoint GC

* Add backwards compatibility with legacy cktps

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Add test to check if attrs are written to ckpt

Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk

* Test if attributes are restored properly from ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix broken `test_callbacks_state_fit_ckpt_path`

`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`

Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`

* Update test_restore.py

* Update test_restore.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check that all attributes are restored properly

* revert changes, use fix on master

* Convert to proper unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor `test_mode_checkpoint_saveload_ckpt`

* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Adrian Wälchli 29eb9cccf2
Rename the `TrainingTypePlugin` base to `Strategy` (#11120)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
2021-12-20 12:50:11 +00:00
Rohit Gupta 61eb6230c2
Prune EvalModelTemplate (#11153) 2021-12-19 13:08:43 +00:00
Rohit Gupta 860959fb3f
Enable logging hparams only if there are any (#11105) 2021-12-17 19:40:56 +01:00
Carlos Mocholí 7e10f6d41f
Save the loop progress state by default (#10784) 2021-12-17 16:00:27 +00:00
Carlos Mocholí 5932f52b2f
Avoid the deprecated `onnx.export(example_outputs=...)` in torch 1.10 (#11116) 2021-12-17 10:11:11 +01:00
Adrian Wälchli e19d93f69e
Initialize ModelCheckpoint state as early as possible (#11108) 2021-12-17 00:18:29 +01:00
Adrian Wälchli 2b0075a47e
Teardown sync-batchnorm after training (#11078)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 18:58:44 +00:00
Rohit Gupta 61a744f5c6
Fix support for logging within callbacks returned from `LightningModule` (#10991)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-14 19:41:29 +01:00
Aka.Fido 72cc8b7ca9
Disable validation completely when `overfit_batches>0` (#9709)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-01 13:57:57 +00:00
Abhinav Arora f63222d966
Remove references to torchtext.legacy from PyTorch Lightning (#10724)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 19:32:07 +00:00
Carlos Mocholí 38ed26ec5a
Do not require omegaconf to run tests (#10832) 2021-11-30 14:48:03 +00:00
Carlos Mocholí 1b43e43e9f
Minor changes in preparation for saving the loops state (#10783) 2021-11-30 19:37:04 +05:30
four4fish 8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy (#10649)
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy

* add changelog

* add model is not none check

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

* Update test_datamodules.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_hooks.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dp.py

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Carlos Mocholí 152eb57def
Rename special to standalone (#10779) 2021-11-26 17:13:14 +00:00
Kaushik B e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` (#10503)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
Rohit Gupta 823bfa6f8a
Update `LightningModule` docs (#10637) 2021-11-23 01:02:04 +05:30
Carlos Mocholí 0de8ab4f2e
Fix failing master due to an interction between PRs (#10627) 2021-11-19 02:04:53 +00:00
Carlos Mocholí 35f6cbe09f
Use `update_wrapper` in test_hooks.py (#10578) 2021-11-19 01:52:55 +01:00
Adrian Wälchli 1ff35ed0f5
Improve code quality in `AcceleratorConnector._configure_slurm_ddp` (#10102) 2021-11-17 23:10:47 +00:00
Carlos Mocholí 0fa07da987
Fail the test when a `DeprecationWarning` is raised (#9940) 2021-11-17 23:41:50 +01:00
Carlos Mocholí ba036fdeea
Support special test parametrizations (#10569) 2021-11-17 15:46:14 +00:00
Rohit Gupta de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#10481) 2021-11-16 06:47:43 +00:00
Carlos Mocholí 6dfcb6afc5
Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests (#10550) 2021-11-16 10:06:47 +05:30
a-gardner1 ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` (#9702)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Adrian Wälchli a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins (#10103)
* rename occurrences of master port, master address, maser node, master process

* rename properties

* add property decorators

* occurrences in docs

* update changelog

* update changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lost method

* create deprecation

* add changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo (but it was already there!!!)

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* add todo

* update more occurences

* add types

* add missing import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
puhuk 412f0a4d24
Remove deprecated dataloader arguments in Trainer methods (#10325)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-04 11:03:39 +01:00
Carlos Mocholí ba23d91320
Update recommendation on `dataloader_idx` (#10318) 2021-11-04 01:39:55 +01:00
victorjoos cc0e9f96a8
Add support for empty `gpus` list to run on CPU (#10246)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-11-01 18:37:38 +00:00
Carlos Mocholí 81d15c5986
Implement double optimizer closure for hook structure consistency (#10167) 2021-10-29 13:03:04 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí 5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` (#10055)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Carlos Mocholí dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10157) 2021-10-27 13:38:39 +01:00
Rohit Gupta 34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016) 2021-10-25 23:12:08 -07:00