Commit Graph

153 Commits

Author SHA1 Message Date
Danielle Pintz 06c5903600
Simplify several profile calls (#11031) 2021-12-14 19:49:19 +00:00
Danielle Pintz 3fcfd0214c
Remove `_call_accelerator_hook` Trainer method (#10999) 2021-12-09 02:27:13 +01:00
Carlos Mocholí 99adc45af1
Follow-up changes to #10575 (#10957)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-07 15:27:52 +01:00
Rajath Bharadwaj 7914e5c157
added UserWarnings if max_epochs not set in the Trainer class (#10700) 2021-12-06 09:44:25 +00:00
Danielle Pintz 6043179931
Re-design `call_hook` interface (#10575) 2021-12-04 16:39:55 -05:00
Carlos Mocholí a28b4cd0c0
Sort out the dataloader idx logic for evaluation (#10923)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-03 20:01:46 +00:00
four4fish 6fe3211573
Unroll dict input before call Accelerator X_steps (#10908)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-03 17:00:52 +00:00
Adrian Wälchli c55bc433ce
Fix retrieval of batch indices when dataloader num_workers > 0 (#10870)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-02 10:36:10 +00:00
Rohit Gupta 5b9995da04
Fix schedule reset logic in pytorch profiler (#10837) 2021-12-02 14:22:49 +05:30
Carlos Mocholí 0061619e0a
Improve typing for loops (#10780) 2021-11-30 20:28:55 +00:00
Carlos Mocholí 1b43e43e9f
Minor changes in preparation for saving the loops state (#10783) 2021-11-30 19:37:04 +05:30
four4fish 1d2878523a
2/n Move Precision Plugin into strategy - move optimizer related logics (#10596)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 08:31:23 +00:00
four4fish 8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy (#10649)
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy

* add changelog

* add model is not none check

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

* Update test_datamodules.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_hooks.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dp.py

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Carlos Mocholí 724a92b065
Mark outputs as protected in the evaluation loops (#10781)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-28 20:09:30 +00:00
Carlos Mocholí 3089dc3829
Improve typing for loops (#10749)
* Improve typing for loops

* Free memory
2021-11-26 18:39:09 +00:00
Carlos Mocholí 31bb6e69ca
Avoid optional instances in Loops (#10735)
* Avoid optional instances in Loops

* More cleanup
2021-11-26 18:00:18 +00:00
Carlos Mocholí ae53562c97
Remove dead code in `TrainingEpochLoop` (#10750) 2021-11-26 17:49:00 +00:00
thomas chaton 3d6262b7a9
Fault Tolerant Manual: Add support for DDP (#10638) 2021-11-25 18:31:53 +01:00
Kaushik B e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` (#10503)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
thomas chaton b28ab34ff5
Fault Tolerant Manual: Add loading to reload the states (#10699)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 17:18:36 +00:00
Carlos Mocholí a6dedcf492
Fix `move_metrics_to_cpu` with evaluation (#10631) 2021-11-22 15:58:21 +00:00
Rohit Gupta ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` (#10408)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Carlos Mocholí 069ec1005a
Do not autodetach extras (#10424)
* Do not autodetach extras

* Update CHANGELOG

* Use foo
2021-11-09 16:07:16 +00:00
Gili Tzabari a967b6eba0
del iterator on_run_end() (#9915) 2021-10-29 16:29:44 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Danielle Pintz 38090e47d7
Small code simplification in `training_epoch_loop.py` (#10146)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-26 13:22:36 +02:00
Danielle Pintz 13d6d7bad1
Remove `optimizer_connector.py` (#10120) 2021-10-26 00:52:43 +00:00
Eric Wiener 0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Danielle Pintz e94dcf6936
Mark `trainer.data_connector` as protected (#10031)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Alessio Bonfiglio 2a2fa5a56a
Group all the logged gradients under the same sub-folder (#7756) 2021-10-20 15:48:36 +00:00
Carlos Mocholí e44921ee21
Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start (#9791) 2021-10-20 01:56:37 +02:00
Ning 0b68f2abf8
Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop (#9671)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-10-19 21:45:52 +02:00
Carlos Mocholí e95f9b71c1
Set the optimization output result class as a class attribute (#9977) 2021-10-19 16:33:08 +01:00
Carlos Mocholí bb2dc68792
Simplify track grad norm condition (#9992) 2021-10-19 15:00:16 +02:00
Adrian Wälchli 65150cdb42
Update docs for base Loop class with examples (#9993)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-18 15:37:23 +00:00
Carlos Mocholí c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start (#9780) 2021-10-18 14:02:16 +02:00
Adrian Wälchli 7a9151637c
loop customization docs (#9609)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2021-10-18 09:43:11 +00:00
four4fish a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Rohit Gupta 23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` (#9584)
* init hook

* docs

* dep train args

* update tests

* doc

* doc

* .gitignore

* not dep

* add trainer args

* add & update tests

* fix tests

* pre-commit

* docs

* add docs

* add exception

* code review

* deepspeed

* update tests

* not

* try fix

* Apply suggestions from code review

* update deepspeed

* disable some tests

* disable some tests

* enable all tests
2021-10-13 20:15:13 +05:30
ananthsub 4610fddb19
Mark `Trainer.terminate_on_nan` protected and deprecate public property (#9849)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 20:23:22 +00:00
Adrian Wälchli 6a0c47a014
remove redundant accumulation normalization in manual optimization (#9769) 2021-10-11 15:26:12 +00:00
Rohit Gupta 4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816)
* deprecate hooks

* dep todo

* explicit

* Apply suggestions from code review

* Apply suggestions from code review

* code review

* base
2021-10-07 10:18:11 +00:00
thomas chaton 5841ca9782
[Feat] Add auto_restart for fault tolerant training (#9722) 2021-10-01 16:37:17 +00:00
Carlos Mocholí 6ef4e5ac76
Remove return value from the backward closure (#9770) 2021-10-01 16:53:00 +02:00
Carlos Mocholí 44aed17aff
Remove duplicated native AMP + LBFGS check (#9748) 2021-09-29 13:14:03 +00:00
thomas chaton fa44dbcd9e
[Refactor] Simplify data loading logic around replacing sampler to prevent confusion (#9721)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-28 17:04:02 +00:00
Carlos Mocholí 198aa852ef
Remove `training_epoch_end` outputs check (#9719)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-28 14:21:46 +00:00
Carlos Mocholí bc50591d49
reduce loop structure leakage into the `TrainingEpochLoop` (#9490)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-28 13:22:22 +00:00
thomas chaton 64bbebc869
[bugfix] Resolve metrics not being properly resetted on validation epoch end (#9717)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-27 16:16:45 +00:00