Adrian Wälchli
a99b7440b5
Add unit tests for `pl.utilities.grads` ( #9765 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-18 18:58:51 +05:30
Rohit Gupta
4dc32ad7db
Fix logic to check for spawn in worker_check ( #9902 )
...
* fix
* update tests
* chlog
* skip windows
2021-10-18 13:02:46 +00:00
Carlos Mocholí
e0470cc244
Update `resume_from_checkpoint` docs ( #9952 )
2021-10-18 17:40:47 +05:30
Carlos Mocholí
c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start ( #9780 )
2021-10-18 14:02:16 +02:00
Carlos Mocholí
01b304ec57
Update accelerator connector messages after the addition of strategy ( #9937 )
2021-10-18 01:10:48 +00:00
Carlos Mocholí
e5dfdf34f9
Avoid deprecation warning after #9901 ( #9951 )
2021-10-16 17:36:25 +01:00
Carlos Mocholí
db4e770004
Validate the precision input earlier ( #9763 )
2021-10-15 17:30:00 +00:00
Danielle Pintz
16213b1635
Deprecate `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback ( #9921 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-14 22:45:44 +02:00
Oliver Borchert
afbf703684
Single-process multi-node CPU training ( #9603 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-14 22:21:41 +02:00
four4fish
a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator ( #9901 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Rohit Gupta
23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` ( #9584 )
...
* init hook
* docs
* dep train args
* update tests
* doc
* doc
* .gitignore
* not dep
* add trainer args
* add & update tests
* fix tests
* pre-commit
* docs
* add docs
* add exception
* code review
* deepspeed
* update tests
* not
* try fix
* Apply suggestions from code review
* update deepspeed
* disable some tests
* disable some tests
* enable all tests
2021-10-13 20:15:13 +05:30
Kaushik B
05b15e63f0
Add `strategy` argument to Trainer ( #8597 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-13 12:34:06 +00:00
ananthsub
28fc8d2016
Add `enable_model_summary` flag and deprecate `weights_summary` ( #9699 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-13 17:20:54 +05:30
Kaushik B
b1e215d036
Remove `should_rank_save_checkpoint` property from Trainer ( #9433 )
2021-10-13 11:36:24 +00:00
Rohit Gupta
0f8fd20443
Remove epoch from `trainer.logged_metrics` ( #9904 )
2021-10-13 11:30:27 +02:00
ananthsub
4610fddb19
Mark `Trainer.terminate_on_nan` protected and deprecate public property ( #9849 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 20:23:22 +00:00
Danielle Pintz
dd6d797e0e
Remove type error handling in _configure_checkpoint_callbacks ( #9823 )
...
* remove type error handling in _configure_checkpoint_callbacks
* rm test
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-12 20:13:02 +00:00
Rohit Gupta
f2b0db60f1
Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured ( #9841 )
...
* add check
* chlog
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Apply suggestions from code review
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-12 15:35:55 +05:30
Adrian Wälchli
64d1c46623
Update error message for interactive incompatible plugins ( #9896 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-10-12 15:10:49 +05:30
ananthsub
f16bfe9bdd
Mark `trainer.config_validator` as protected ( #9779 )
2021-10-12 09:29:05 +01:00
Rohit Gupta
db322f4bbb
Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` ( #9754 )
...
* enable_chekpointing
* update codebase
* chlog
* update tests
* fix warning
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-12 07:55:07 +00:00
yopknopixx
173f4c8466
Deprecate `terminate_on_nan` Trainer argument in favor of `detect_anomaly` ( #9175 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-11 17:17:43 +00:00
Rohit Gupta
46fa703853
disable_logger ( #9837 )
2021-10-11 16:36:59 +05:30
Rohit Gupta
d71501d97f
Reset `val_dataloader` in `tuner/batch_size_scaling` ( #9857 )
...
* reset val
* chlog
2021-10-11 09:13:33 +01:00
kingyiusuen
8740c801bb
Fix typo in _validate_scheduler_optimizer() ( #9886 )
2021-10-11 09:16:17 +02:00
ananthsub
5206e52786
Add support for `torch.set_detect_anomaly` ( #9848 )
...
* Add support for `detect_anomaly`
* Update CHANGELOG.md
2021-10-07 16:03:56 +00:00
Rohit Gupta
4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` ( #9816 )
...
* deprecate hooks
* dep todo
* explicit
* Apply suggestions from code review
* Apply suggestions from code review
* code review
* base
2021-10-07 10:18:11 +00:00
Rohit Gupta
b303b4f895
Fix restoring training state during `trainer.fit` only ( #9413 )
...
* reload state on fit
* trainer.state
* add test
* chlog
* revert
* review
* review
* rev and ammend
* fix test and logic
* update
* code review
* Apply suggestions from code review
* better assertions
* better assertions
* Apply suggestions from code review
* add loop test
* Apply suggestions from code review
* Split for typing
* review comments
* review comments
* use if_else
* code review
* code review
* code review
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Remove unnecessary pieces from the test
* move test
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-06 14:57:40 +00:00
Kaushik B
f94faa9cd3
Enable auto parameters tying for TPUs ( #9525 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-06 10:16:44 +02:00
kingyiusuen
6d530373c0
Add warnings regarding unsupported keys in optim config and OneCycleLR ( #9666 )
...
* Add warnings regarding unsupported keys in optim config and OneCycleLR
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix docstring
* Update CHANGELOG.md
* Split into two parts
* Use difference operator to find extra keys
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-04 08:25:05 +00:00
thomas chaton
5841ca9782
[Feat] Add auto_restart for fault tolerant training ( #9722 )
2021-10-01 16:37:17 +00:00
Rohit Gupta
617e798f3b
Raise an exception if using `amp_level` with native `amp_backend` ( #9755 )
...
* add exception
* chlog
* code review
* Apply suggestions from code review
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-01 14:27:05 +02:00
ananthsub
0d3325ea20
Add support for `torch.use_deterministic_algorithms` ( #9121 )
...
* re-add changes
* Update test_data_parallel.py
* Update CHANGELOG.md
* Update test_legacy_checkpoints.py
* Update test_horovod.py
* Update test_horovod.py
* Update accelerator_connector.py
* update tests
2021-09-30 04:40:09 +00:00
Carlos Mocholí
19008ce98f
IPU hotfix for #9721 ( #9759 )
2021-09-29 15:36:39 +02:00
Carlos Mocholí
0ddd6a8c19
Remove `_NATIVE_AMP_AVAILABLE` checks ( #9747 )
2021-09-29 15:34:26 +02:00
thomas chaton
fa44dbcd9e
[Refactor] Simplify data loading logic around replacing sampler to prevent confusion ( #9721 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-28 17:04:02 +00:00
Danielle Pintz
43896a7666
Removed deprecated property is_using_torchelastic from AcceleratorConnector ( #9729 )
2021-09-28 14:57:03 +02:00
thomas chaton
64bbebc869
[bugfix] Resolve metrics not being properly resetted on validation epoch end ( #9717 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-27 16:16:45 +00:00
Adrian Wälchli
f74eb58493
remove `InternalDebugger` ( #9680 )
...
* wip
* reset _notebooks
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* reset _notebooks
* testing with mock
* update test with mock
* update test
* update tests
* update test
* remove track_load_dataloader_calls
* update last test
* remove unused imports
* remove InternalDebugger
* update changelog
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-27 09:33:45 -04:00
four4fish
15cd6ad45b
Call TrainingTypePlugin collective functions directly instead of going through the Accelerator ( #9677 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-27 14:52:57 +02:00
ananthsub
36b9ff2423
Deprecate `stochastic_weight_avg` from the `Trainer` constructor ( #8989 )
...
* Deprecate `stochastic_weight_avg` from the `Trainer` constructor
* Update CHANGELOG.md
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-26 16:19:15 +00:00
Rohit Gupta
a4bc0acb02
Update warnings in `TrainingTricksConnector` ( #9595 )
...
* update warnings
* add tests
* comments
* Apply suggestions from code review
* Apply suggestions from code review
2021-09-25 16:02:26 +00:00
Danielle Pintz
b3a5c7f442
Add `enable_progress_bar` to Trainer constructor ( #9664 )
2021-09-24 22:53:31 -07:00
Carlos Mocholí
d02fc2b728
Rename `reset_on_epoch` to `reset_on_run` ( #9658 )
2021-09-25 04:27:54 +02:00
Rohit Gupta
8fcdcb598b
Fix `accumulate_grad_batches` on init ( #9652 )
...
* fix accumuate_grad_batches on init
* chlog
* update error
* move to callback connector
* add test with callback
* fix tests
* Update pytorch_lightning/trainer/connectors/callback_connector.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update ipu logic
* rev
* rev
* rev
* pls work
* code review
Co-authored-by: Rohit Gupta <goku@rmac.local>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-24 18:51:54 +00:00
thomas chaton
9148a13de0
Enable DataLoader state restoration for the evaluation loop ( #9563 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-24 16:21:00 +00:00
Carlos Mocholí
ce00053002
Support skipping to validation ( #9681 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-24 14:10:25 +00:00
Adrian Wälchli
d67aff7494
remove `InternalDebugger.track_load_dataloader_call` ( #9675 )
...
* wip
* reset _notebooks
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* reset _notebooks
* testing with mock
* update test with mock
* update test
* update tests
* update test
* remove track_load_dataloader_calls
* update last test
* remove unused imports
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-24 15:37:36 +02:00
ananthsub
41e3be197f
Remove `call_configure_sharded_model` lifecycle property ( #9612 )
2021-09-24 03:57:53 +02:00
Carlos Mocholí
8dcba38e0e
Add `is_last_batch` to progress tracking ( #9657 )
2021-09-23 12:54:41 +00:00