Aki Nitta
9da78a94bd
Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` ( #11190 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 16:36:16 +00:00
Adrian Wälchli
f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic ( #11137 )
2021-12-20 17:41:22 +01:00
ORippler
86a3c5e2a3
Add required states for resumed ModelCheckpoint GC ( #10995 )
...
* Add required states for resumed ModelCheckpoint GC
* Add backwards compatibility with legacy cktps
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Add test to check if attrs are written to ckpt
Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk
* Test if attributes are restored properly from ckpt
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix broken `test_callbacks_state_fit_ckpt_path`
`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`
Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`
* Update test_restore.py
* Update test_restore.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Check that all attributes are restored properly
* revert changes, use fix on master
* Convert to proper unit test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor `test_mode_checkpoint_saveload_ckpt`
* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Adrian Wälchli
29eb9cccf2
Rename the `TrainingTypePlugin` base to `Strategy` ( #11120 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
2021-12-20 12:50:11 +00:00
Rohit Gupta
61eb6230c2
Prune EvalModelTemplate ( #11153 )
2021-12-19 13:08:43 +00:00
Rohit Gupta
860959fb3f
Enable logging hparams only if there are any ( #11105 )
2021-12-17 19:40:56 +01:00
Carlos Mocholí
7e10f6d41f
Save the loop progress state by default ( #10784 )
2021-12-17 16:00:27 +00:00
Carlos Mocholí
5932f52b2f
Avoid the deprecated `onnx.export(example_outputs=...)` in torch 1.10 ( #11116 )
2021-12-17 10:11:11 +01:00
Adrian Wälchli
e19d93f69e
Initialize ModelCheckpoint state as early as possible ( #11108 )
2021-12-17 00:18:29 +01:00
Adrian Wälchli
2b0075a47e
Teardown sync-batchnorm after training ( #11078 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 18:58:44 +00:00
Rohit Gupta
61a744f5c6
Fix support for logging within callbacks returned from `LightningModule` ( #10991 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-14 19:41:29 +01:00
Aka.Fido
72cc8b7ca9
Disable validation completely when `overfit_batches>0` ( #9709 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-01 13:57:57 +00:00
Abhinav Arora
f63222d966
Remove references to torchtext.legacy from PyTorch Lightning ( #10724 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 19:32:07 +00:00
Carlos Mocholí
38ed26ec5a
Do not require omegaconf to run tests ( #10832 )
2021-11-30 14:48:03 +00:00
Carlos Mocholí
1b43e43e9f
Minor changes in preparation for saving the loops state ( #10783 )
2021-11-30 19:37:04 +05:30
four4fish
8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy ( #10649 )
...
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy
* add changelog
* add model is not none check
* Apply suggestions from code review
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
* Update test_datamodules.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update test_hooks.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update dp.py
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Carlos Mocholí
152eb57def
Rename special to standalone ( #10779 )
2021-11-26 17:13:14 +00:00
Kaushik B
e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` ( #10503 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
Rohit Gupta
823bfa6f8a
Update `LightningModule` docs ( #10637 )
2021-11-23 01:02:04 +05:30
Carlos Mocholí
0de8ab4f2e
Fix failing master due to an interction between PRs ( #10627 )
2021-11-19 02:04:53 +00:00
Carlos Mocholí
35f6cbe09f
Use `update_wrapper` in test_hooks.py ( #10578 )
2021-11-19 01:52:55 +01:00
Adrian Wälchli
1ff35ed0f5
Improve code quality in `AcceleratorConnector._configure_slurm_ddp` ( #10102 )
2021-11-17 23:10:47 +00:00
Carlos Mocholí
0fa07da987
Fail the test when a `DeprecationWarning` is raised ( #9940 )
2021-11-17 23:41:50 +01:00
Carlos Mocholí
ba036fdeea
Support special test parametrizations ( #10569 )
2021-11-17 15:46:14 +00:00
Rohit Gupta
de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` ( #10481 )
2021-11-16 06:47:43 +00:00
Carlos Mocholí
6dfcb6afc5
Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests ( #10550 )
2021-11-16 10:06:47 +05:30
a-gardner1
ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` ( #9702 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Adrian Wälchli
a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins ( #10103 )
...
* rename occurrences of master port, master address, maser node, master process
* rename properties
* add property decorators
* occurrences in docs
* update changelog
* update changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add lost method
* create deprecation
* add changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo (but it was already there!!!)
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* add todo
* update more occurences
* add types
* add missing import
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
puhuk
412f0a4d24
Remove deprecated dataloader arguments in Trainer methods ( #10325 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-04 11:03:39 +01:00
Carlos Mocholí
ba23d91320
Update recommendation on `dataloader_idx` ( #10318 )
2021-11-04 01:39:55 +01:00
victorjoos
cc0e9f96a8
Add support for empty `gpus` list to run on CPU ( #10246 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-11-01 18:37:38 +00:00
Carlos Mocholí
81d15c5986
Implement double optimizer closure for hook structure consistency ( #10167 )
2021-10-29 13:03:04 +00:00
Carlos Mocholí
03f01fb5ec
Fix gradient norm tracking and gradient clipping ( #9287 )
...
* WIP
* Progress
* Undo test change
* Fix plugin closure execution order
* Update CHANGELOG
* Fix manual optimization on AMP and skipping backward
* Fix for deepspeed
* Typo
* Hook test for manual closure
* Add skipping test with AMP
* You are hideous, apex
* Add deepspeed test
* Update CHANGELOG
* Fix for broken master
* Add RunIf
* FIXMEs
* Rename
* Fix grad norm
* add a simple test
* update test
* update test
* update test
* fix merge conflicts
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Sea of changes
* Undo change
* Introduce TPUPrecisionPlugin
* Undo changes
* Undo changes
* Resolve FIXME
* Undo change
* Undo change
* Undo change
* Fix FIXMEs
* Fix FIXME
* Correct value
* Bad merge
* Fix circular imports
* WIP
* Fixing clipping
* Fixes
* Bad merge
* Move optimizer step and clipping into the `PrecisionPlugin`
* Fix AMP
* Update CHANGELOG
* Fix tests
* Underscore
* Progress
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove pre_optimizer_step
* Missed one
* Progress
* Progress
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update FIXMEs
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix test
* DeepSpeed warning. mypy
* Rename
* Finish tests
* Update CHANGELOG
* Dumb fixes
* accelerator=auto
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update on comments
* Use ClassifModule
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí
5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` ( #10055 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Carlos Mocholí
dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` ( #10157 )
2021-10-27 13:38:39 +01:00
Rohit Gupta
34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods ( #10016 )
2021-10-25 23:12:08 -07:00
Rajat Goel
47e7a2860f
Fix Enums parsing in generated hparms yaml ( #9170 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Danielle Pintz
1f7bd6650c
Mark accelerator connector as protected ( #10032 )
2021-10-25 19:24:54 +00:00
jjenniferdai
6d79184ec5
Unify checkpoint load paths [redo #9693 ] ( #10061 )
2021-10-25 19:05:31 +00:00
Adrian Wälchli
76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected ( #10101 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli
7eb2edf421
rename set_random_master_port ( #10104 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Kaushik B
56bc55db71
Update strategy flag in docs ( #10000 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:02:53 +05:30
Carlos Mocholí
f0b3e0f4de
Default to `precision=bf16` on CPU when `precision=16` is passed ( #10033 )
2021-10-20 13:25:13 +00:00
Rohit Gupta
0aa220b46b
Remove deprecated `distributed_backend` from `Trainer` ( #10017 )
...
* rm distributed_backend from Trainer
* unused
* chlog
* internal distributed_backend
* Docstring
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-19 13:54:37 +00:00
Kaushik B
5e8829b97d
(1/n) tests: Use strategy flag instead of accelerator for training strategies ( #9931 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-16 20:40:25 +05:30
Carlos Mocholí
e973bcb76a
Use non-deprecated options in tests ( #9949 )
2021-10-15 16:58:07 -07:00
Rohit Gupta
23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` ( #9584 )
...
* init hook
* docs
* dep train args
* update tests
* doc
* doc
* .gitignore
* not dep
* add trainer args
* add & update tests
* fix tests
* pre-commit
* docs
* add docs
* add exception
* code review
* deepspeed
* update tests
* not
* try fix
* Apply suggestions from code review
* update deepspeed
* disable some tests
* disable some tests
* enable all tests
2021-10-13 20:15:13 +05:30
Kaushik B
05b15e63f0
Add `strategy` argument to Trainer ( #8597 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-13 12:34:06 +00:00
ananthsub
28fc8d2016
Add `enable_model_summary` flag and deprecate `weights_summary` ( #9699 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-13 17:20:54 +05:30