Danielle Pintz
06c5903600
Simplify several profile calls ( #11031 )
2021-12-14 19:49:19 +00:00
Danielle Pintz
3fcfd0214c
Remove `_call_accelerator_hook` Trainer method ( #10999 )
2021-12-09 02:27:13 +01:00
Carlos Mocholí
99adc45af1
Follow-up changes to #10575 ( #10957 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-07 15:27:52 +01:00
Rajath Bharadwaj
7914e5c157
added UserWarnings if max_epochs not set in the Trainer class ( #10700 )
2021-12-06 09:44:25 +00:00
Danielle Pintz
6043179931
Re-design `call_hook` interface ( #10575 )
2021-12-04 16:39:55 -05:00
Carlos Mocholí
a28b4cd0c0
Sort out the dataloader idx logic for evaluation ( #10923 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-03 20:01:46 +00:00
four4fish
6fe3211573
Unroll dict input before call Accelerator X_steps ( #10908 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-03 17:00:52 +00:00
Adrian Wälchli
c55bc433ce
Fix retrieval of batch indices when dataloader num_workers > 0 ( #10870 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-02 10:36:10 +00:00
Rohit Gupta
5b9995da04
Fix schedule reset logic in pytorch profiler ( #10837 )
2021-12-02 14:22:49 +05:30
Carlos Mocholí
0061619e0a
Improve typing for loops ( #10780 )
2021-11-30 20:28:55 +00:00
Carlos Mocholí
1b43e43e9f
Minor changes in preparation for saving the loops state ( #10783 )
2021-11-30 19:37:04 +05:30
four4fish
1d2878523a
2/n Move Precision Plugin into strategy - move optimizer related logics ( #10596 )
...
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 08:31:23 +00:00
four4fish
8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy ( #10649 )
...
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy
* add changelog
* add model is not none check
* Apply suggestions from code review
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
* Update test_datamodules.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update test_hooks.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update dp.py
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Carlos Mocholí
724a92b065
Mark outputs as protected in the evaluation loops ( #10781 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-28 20:09:30 +00:00
Carlos Mocholí
3089dc3829
Improve typing for loops ( #10749 )
...
* Improve typing for loops
* Free memory
2021-11-26 18:39:09 +00:00
Carlos Mocholí
31bb6e69ca
Avoid optional instances in Loops ( #10735 )
...
* Avoid optional instances in Loops
* More cleanup
2021-11-26 18:00:18 +00:00
Carlos Mocholí
ae53562c97
Remove dead code in `TrainingEpochLoop` ( #10750 )
2021-11-26 17:49:00 +00:00
thomas chaton
3d6262b7a9
Fault Tolerant Manual: Add support for DDP ( #10638 )
2021-11-25 18:31:53 +01:00
Kaushik B
e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` ( #10503 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
thomas chaton
b28ab34ff5
Fault Tolerant Manual: Add loading to reload the states ( #10699 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 17:18:36 +00:00
Carlos Mocholí
a6dedcf492
Fix `move_metrics_to_cpu` with evaluation ( #10631 )
2021-11-22 15:58:21 +00:00
Rohit Gupta
ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` ( #10408 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Carlos Mocholí
069ec1005a
Do not autodetach extras ( #10424 )
...
* Do not autodetach extras
* Update CHANGELOG
* Use foo
2021-11-09 16:07:16 +00:00
Gili Tzabari
a967b6eba0
del iterator on_run_end() ( #9915 )
2021-10-29 16:29:44 +00:00
Carlos Mocholí
03f01fb5ec
Fix gradient norm tracking and gradient clipping ( #9287 )
...
* WIP
* Progress
* Undo test change
* Fix plugin closure execution order
* Update CHANGELOG
* Fix manual optimization on AMP and skipping backward
* Fix for deepspeed
* Typo
* Hook test for manual closure
* Add skipping test with AMP
* You are hideous, apex
* Add deepspeed test
* Update CHANGELOG
* Fix for broken master
* Add RunIf
* FIXMEs
* Rename
* Fix grad norm
* add a simple test
* update test
* update test
* update test
* fix merge conflicts
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Sea of changes
* Undo change
* Introduce TPUPrecisionPlugin
* Undo changes
* Undo changes
* Resolve FIXME
* Undo change
* Undo change
* Undo change
* Fix FIXMEs
* Fix FIXME
* Correct value
* Bad merge
* Fix circular imports
* WIP
* Fixing clipping
* Fixes
* Bad merge
* Move optimizer step and clipping into the `PrecisionPlugin`
* Fix AMP
* Update CHANGELOG
* Fix tests
* Underscore
* Progress
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove pre_optimizer_step
* Missed one
* Progress
* Progress
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update FIXMEs
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix test
* DeepSpeed warning. mypy
* Rename
* Finish tests
* Update CHANGELOG
* Dumb fixes
* accelerator=auto
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update on comments
* Use ClassifModule
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Danielle Pintz
38090e47d7
Small code simplification in `training_epoch_loop.py` ( #10146 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-26 13:22:36 +02:00
Danielle Pintz
13d6d7bad1
Remove `optimizer_connector.py` ( #10120 )
2021-10-26 00:52:43 +00:00
Eric Wiener
0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` ( #9460 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Danielle Pintz
e94dcf6936
Mark `trainer.data_connector` as protected ( #10031 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Alessio Bonfiglio
2a2fa5a56a
Group all the logged gradients under the same sub-folder ( #7756 )
2021-10-20 15:48:36 +00:00
Carlos Mocholí
e44921ee21
Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start ( #9791 )
2021-10-20 01:56:37 +02:00
Ning
0b68f2abf8
Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop ( #9671 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-10-19 21:45:52 +02:00
Carlos Mocholí
e95f9b71c1
Set the optimization output result class as a class attribute ( #9977 )
2021-10-19 16:33:08 +01:00
Carlos Mocholí
bb2dc68792
Simplify track grad norm condition ( #9992 )
2021-10-19 15:00:16 +02:00
Adrian Wälchli
65150cdb42
Update docs for base Loop class with examples ( #9993 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-18 15:37:23 +00:00
Carlos Mocholí
c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start ( #9780 )
2021-10-18 14:02:16 +02:00
Adrian Wälchli
7a9151637c
loop customization docs ( #9609 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2021-10-18 09:43:11 +00:00
four4fish
a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator ( #9901 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Rohit Gupta
23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` ( #9584 )
...
* init hook
* docs
* dep train args
* update tests
* doc
* doc
* .gitignore
* not dep
* add trainer args
* add & update tests
* fix tests
* pre-commit
* docs
* add docs
* add exception
* code review
* deepspeed
* update tests
* not
* try fix
* Apply suggestions from code review
* update deepspeed
* disable some tests
* disable some tests
* enable all tests
2021-10-13 20:15:13 +05:30
ananthsub
4610fddb19
Mark `Trainer.terminate_on_nan` protected and deprecate public property ( #9849 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 20:23:22 +00:00
Adrian Wälchli
6a0c47a014
remove redundant accumulation normalization in manual optimization ( #9769 )
2021-10-11 15:26:12 +00:00
Rohit Gupta
4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` ( #9816 )
...
* deprecate hooks
* dep todo
* explicit
* Apply suggestions from code review
* Apply suggestions from code review
* code review
* base
2021-10-07 10:18:11 +00:00
thomas chaton
5841ca9782
[Feat] Add auto_restart for fault tolerant training ( #9722 )
2021-10-01 16:37:17 +00:00
Carlos Mocholí
6ef4e5ac76
Remove return value from the backward closure ( #9770 )
2021-10-01 16:53:00 +02:00
Carlos Mocholí
44aed17aff
Remove duplicated native AMP + LBFGS check ( #9748 )
2021-09-29 13:14:03 +00:00
thomas chaton
fa44dbcd9e
[Refactor] Simplify data loading logic around replacing sampler to prevent confusion ( #9721 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-28 17:04:02 +00:00
Carlos Mocholí
198aa852ef
Remove `training_epoch_end` outputs check ( #9719 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-28 14:21:46 +00:00
Carlos Mocholí
bc50591d49
reduce loop structure leakage into the `TrainingEpochLoop` ( #9490 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-28 13:22:22 +00:00
thomas chaton
64bbebc869
[bugfix] Resolve metrics not being properly resetted on validation epoch end ( #9717 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-27 16:16:45 +00:00