Commit Graph

1454 Commits

Author SHA1 Message Date
Adrian Wälchli a99b7440b5
Add unit tests for `pl.utilities.grads` (#9765)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-18 18:58:51 +05:30
Rohit Gupta 4dc32ad7db
Fix logic to check for spawn in worker_check (#9902)
* fix

* update tests

* chlog

* skip windows
2021-10-18 13:02:46 +00:00
Carlos Mocholí e0470cc244
Update `resume_from_checkpoint` docs (#9952) 2021-10-18 17:40:47 +05:30
Carlos Mocholí c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start (#9780) 2021-10-18 14:02:16 +02:00
Carlos Mocholí 01b304ec57
Update accelerator connector messages after the addition of strategy (#9937) 2021-10-18 01:10:48 +00:00
Carlos Mocholí e5dfdf34f9
Avoid deprecation warning after #9901 (#9951) 2021-10-16 17:36:25 +01:00
Carlos Mocholí db4e770004
Validate the precision input earlier (#9763) 2021-10-15 17:30:00 +00:00
Danielle Pintz 16213b1635
Deprecate `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback (#9921)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-14 22:45:44 +02:00
Oliver Borchert afbf703684
Single-process multi-node CPU training (#9603)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-14 22:21:41 +02:00
four4fish a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Rohit Gupta 23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` (#9584)
* init hook

* docs

* dep train args

* update tests

* doc

* doc

* .gitignore

* not dep

* add trainer args

* add & update tests

* fix tests

* pre-commit

* docs

* add docs

* add exception

* code review

* deepspeed

* update tests

* not

* try fix

* Apply suggestions from code review

* update deepspeed

* disable some tests

* disable some tests

* enable all tests
2021-10-13 20:15:13 +05:30
Kaushik B 05b15e63f0
Add `strategy` argument to Trainer (#8597)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-13 12:34:06 +00:00
ananthsub 28fc8d2016
Add `enable_model_summary` flag and deprecate `weights_summary` (#9699)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-13 17:20:54 +05:30
Kaushik B b1e215d036
Remove `should_rank_save_checkpoint` property from Trainer (#9433) 2021-10-13 11:36:24 +00:00
Rohit Gupta 0f8fd20443
Remove epoch from `trainer.logged_metrics` (#9904) 2021-10-13 11:30:27 +02:00
ananthsub 4610fddb19
Mark `Trainer.terminate_on_nan` protected and deprecate public property (#9849)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 20:23:22 +00:00
Danielle Pintz dd6d797e0e
Remove type error handling in _configure_checkpoint_callbacks (#9823)
* remove type error handling in _configure_checkpoint_callbacks

* rm test

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-12 20:13:02 +00:00
Rohit Gupta f2b0db60f1
Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured (#9841)
* add check

* chlog

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-12 15:35:55 +05:30
Adrian Wälchli 64d1c46623
Update error message for interactive incompatible plugins (#9896)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-10-12 15:10:49 +05:30
ananthsub f16bfe9bdd
Mark `trainer.config_validator` as protected (#9779) 2021-10-12 09:29:05 +01:00
Rohit Gupta db322f4bbb
Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` (#9754)
* enable_chekpointing

* update codebase

* chlog

* update tests

* fix warning

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-12 07:55:07 +00:00
yopknopixx 173f4c8466
Deprecate `terminate_on_nan` Trainer argument in favor of `detect_anomaly` (#9175)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-11 17:17:43 +00:00
Rohit Gupta 46fa703853
disable_logger (#9837) 2021-10-11 16:36:59 +05:30
Rohit Gupta d71501d97f
Reset `val_dataloader` in `tuner/batch_size_scaling` (#9857)
* reset val

* chlog
2021-10-11 09:13:33 +01:00
kingyiusuen 8740c801bb
Fix typo in _validate_scheduler_optimizer() (#9886) 2021-10-11 09:16:17 +02:00
ananthsub 5206e52786
Add support for `torch.set_detect_anomaly` (#9848)
* Add support for `detect_anomaly`

* Update CHANGELOG.md
2021-10-07 16:03:56 +00:00
Rohit Gupta 4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816)
* deprecate hooks

* dep todo

* explicit

* Apply suggestions from code review

* Apply suggestions from code review

* code review

* base
2021-10-07 10:18:11 +00:00
Rohit Gupta b303b4f895
Fix restoring training state during `trainer.fit` only (#9413)
* reload state on fit

* trainer.state

* add test

* chlog

* revert

* review

* review

* rev and ammend

* fix test and logic

* update

* code review

* Apply suggestions from code review

* better assertions

* better assertions

* Apply suggestions from code review

* add loop test

* Apply suggestions from code review

* Split for typing

* review comments

* review comments

* use if_else

* code review

* code review

* code review

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Remove unnecessary pieces from the test

* move test

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-06 14:57:40 +00:00
Kaushik B f94faa9cd3
Enable auto parameters tying for TPUs (#9525)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-06 10:16:44 +02:00
kingyiusuen 6d530373c0
Add warnings regarding unsupported keys in optim config and OneCycleLR (#9666)
* Add warnings regarding unsupported keys in optim config and OneCycleLR

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix docstring

* Update CHANGELOG.md

* Split  into two parts

* Use difference operator to find extra keys

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-04 08:25:05 +00:00
thomas chaton 5841ca9782
[Feat] Add auto_restart for fault tolerant training (#9722) 2021-10-01 16:37:17 +00:00
Rohit Gupta 617e798f3b
Raise an exception if using `amp_level` with native `amp_backend` (#9755)
* add exception

* chlog

* code review

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-01 14:27:05 +02:00
ananthsub 0d3325ea20
Add support for `torch.use_deterministic_algorithms` (#9121)
* re-add changes

* Update test_data_parallel.py

* Update CHANGELOG.md

* Update test_legacy_checkpoints.py

* Update test_horovod.py

* Update test_horovod.py

* Update accelerator_connector.py

* update tests
2021-09-30 04:40:09 +00:00
Carlos Mocholí 19008ce98f
IPU hotfix for #9721 (#9759) 2021-09-29 15:36:39 +02:00
Carlos Mocholí 0ddd6a8c19
Remove `_NATIVE_AMP_AVAILABLE` checks (#9747) 2021-09-29 15:34:26 +02:00
thomas chaton fa44dbcd9e
[Refactor] Simplify data loading logic around replacing sampler to prevent confusion (#9721)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-28 17:04:02 +00:00
Danielle Pintz 43896a7666
Removed deprecated property is_using_torchelastic from AcceleratorConnector (#9729) 2021-09-28 14:57:03 +02:00
thomas chaton 64bbebc869
[bugfix] Resolve metrics not being properly resetted on validation epoch end (#9717)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-27 16:16:45 +00:00
Adrian Wälchli f74eb58493
remove `InternalDebugger` (#9680)
* wip

* reset _notebooks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reset _notebooks

* testing with mock

* update test with mock

* update test

* update tests

* update test

* remove track_load_dataloader_calls

* update last test

* remove unused imports

* remove InternalDebugger

* update changelog

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-27 09:33:45 -04:00
four4fish 15cd6ad45b
Call TrainingTypePlugin collective functions directly instead of going through the Accelerator (#9677)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-27 14:52:57 +02:00
ananthsub 36b9ff2423
Deprecate `stochastic_weight_avg` from the `Trainer` constructor (#8989)
* Deprecate `stochastic_weight_avg` from the `Trainer` constructor

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-26 16:19:15 +00:00
Rohit Gupta a4bc0acb02
Update warnings in `TrainingTricksConnector` (#9595)
* update warnings

* add tests

* comments

* Apply suggestions from code review

* Apply suggestions from code review
2021-09-25 16:02:26 +00:00
Danielle Pintz b3a5c7f442
Add `enable_progress_bar` to Trainer constructor (#9664) 2021-09-24 22:53:31 -07:00
Carlos Mocholí d02fc2b728
Rename `reset_on_epoch` to `reset_on_run` (#9658) 2021-09-25 04:27:54 +02:00
Rohit Gupta 8fcdcb598b
Fix `accumulate_grad_batches` on init (#9652)
* fix accumuate_grad_batches on init

* chlog

* update error

* move to callback connector

* add test with callback

* fix tests

* Update pytorch_lightning/trainer/connectors/callback_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update ipu logic

* rev

* rev

* rev

* pls work

* code review

Co-authored-by: Rohit Gupta <goku@rmac.local>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-24 18:51:54 +00:00
thomas chaton 9148a13de0
Enable DataLoader state restoration for the evaluation loop (#9563)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-24 16:21:00 +00:00
Carlos Mocholí ce00053002
Support skipping to validation (#9681)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-24 14:10:25 +00:00
Adrian Wälchli d67aff7494
remove `InternalDebugger.track_load_dataloader_call` (#9675)
* wip

* reset _notebooks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reset _notebooks

* testing with mock

* update test with mock

* update test

* update tests

* update test

* remove track_load_dataloader_calls

* update last test

* remove unused imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-24 15:37:36 +02:00
ananthsub 41e3be197f
Remove `call_configure_sharded_model` lifecycle property (#9612) 2021-09-24 03:57:53 +02:00
Carlos Mocholí 8dcba38e0e
Add `is_last_batch` to progress tracking (#9657) 2021-09-23 12:54:41 +00:00