Commit Graph

1903 Commits

Author SHA1 Message Date
Carlos Mocholí e0f2e041b9
Share the training step output data via `ClosureResult` (#9349) 2021-09-10 11:40:20 +00:00
Kaushik B d028e36946
Add remove_checkpoint to CheckpointIO plugin to simplify ModelCheckpo… (#9373)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-10 11:55:04 +01:00
Danielle Pintz 160e7e1289
Deprecate LightningModule.get_progress_bar_dict (#8985)
* Move get_progress_bar_dict from lightning module to progress bar callback
2021-09-09 20:53:47 +00:00
Adrian Wälchli 25af4b137e
rewrite and improve tests for truncated back-propagation (#9369)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-09-08 20:32:59 +00:00
Carlos Mocholí 8407238d66
Keep hidden state in the optimization loops (#9368) 2021-09-08 13:43:40 +00:00
Carlos Mocholí f239b96320
Fix `replace_sampler` missing the batch size under specific conditions (#9367) 2021-09-08 12:27:59 +02:00
Carlos Mocholí 15d943089d
Enforce that the optimizer closure is executed when `optimizer_step` is overridden (#9360) 2021-09-08 12:24:57 +02:00
Adrian Wälchli 91ce0d0a99
Remove checkpoint tracking from internal debugger (#9326)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-08 00:42:31 +00:00
Adrian Wälchli ca679cd78f
Add `ManualOptimization` loop (#9266)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-09-08 02:26:39 +02:00
Sean Naren a79c351a6a
Add a warning to deepspeed when inferring batch size (#9221) 2021-09-07 16:24:00 +00:00
Carlos Mocholí 6892d533ea
Run plugin closure before `on_before_optimizer_step` [1/2] (#9288)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-07 11:52:20 +00:00
Sean Naren d49709e29c
Remove todo, ensure we only check rank 0 for deepspeed warning (#9311) 2021-09-07 11:20:29 +00:00
Carlos Mocholí 392c577825
Add test assertion (#9309) 2021-09-06 16:06:26 +00:00
Marten Lienen 98e2f56db0
Clear reference to training loss at the end of train step (#9336)
Without clearing this reference, the loss tensor stays live through the next training
step. This can be a problem for memory intensive models that produce very deep backward
graphs such as neural ODEs. For these models, keeping the backward graph of the previous
loss in memory can lead to OOM errors in the next training step even though the step might
have succeeded if we had cleared (and thus GC'd) the previous backward graph.

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-06 13:37:27 +00:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Sean Naren 72bb0186fb
Update requirements, update test (#9345) 2021-09-06 12:58:54 +01:00
Carlos Mocholí 05ff1b2085
Remove unnecessary `TrainingEpochLoop` return (#9298) 2021-09-06 13:54:33 +02:00
Adrian Wälchli 9a14f04322
Fix mypy typing errors in optimizer loop (#9317) 2021-09-06 13:54:07 +02:00
thomas chaton 9149b64908
[bugfix] Resolve PyTorch Profiling for Manual Optimization (#9316)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-06 10:45:34 +00:00
Roger Shieh 904dde7573
Fix inspection of unspecified args for container hparams (#9125)
* Update parsing.py

* add todo (for single arg)

* unblock non container single arg

* init test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update CHANGELOG.md

* pep8 line length

* Update pytorch_lightning/utilities/parsing.py

* remove dict namespace conversion

* add omegaconf support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add dict test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add omegaconf test

* Update CHANGELOG.md

* Update pytorch_lightning/utilities/parsing.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/utilities/parsing.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-09-06 09:48:11 +00:00
Carlos Mocholí 73fca23bed
Add typing for `ResultCollection` [3/3] (#9271) 2021-09-06 09:34:40 +00:00
Adrian Wälchli 50198d7483
fix progress bar restart with fault-tolerant training enabled (#9310)
* reset progress updates
* update docs
* add test

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 10:43:59 +02:00
Adrian Wälchli f9132e8db6
remove early stopping tracking from internal debugger (#9327)
* replace dev debugger in early stopping

* remove unused imports
2021-09-06 10:43:03 +02:00
Kaushik B dc3391beae
Remove deprecation warnings being called for `on_{task}_dataloader` (#9279)
* Avoid deprecation warnings being called when hooks are not implemented
* Update tests & changelog
* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-06 10:03:30 +02:00
Danielle Pintz 912fd31131
Deprecate on_keyboard_interrupt callback hook (#9260)
* add on_exception callback hook

* deprecate on_keyboard_interrupt

* Apply suggestions from code review

* raise keyboard interrupt

* Delete cluster

* update changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-06 09:57:00 +02:00
Carlos Mocholí 49c0485d50
Avoid optional `Tracker` attributes and enable mypy (#9320) 2021-09-06 00:20:44 +00:00
Eric Wiener cf1a589956
Support infinite training (#8877)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-04 23:33:43 +00:00
John St. John c30d9b9fae
Update call to `amp.autocast` from `fast_dtype` to `dtype` (#9211)
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-04 02:59:11 +00:00
Gili Tzabari 908e60dc85
Renamed `lr_dict` to `lr_scheduler_config` (#9313) 2021-09-04 00:47:43 +00:00
thomas chaton f6d40871bd
Prevent loss to be moved to the cpu before backward call. (#9308) 2021-09-03 16:26:26 +00:00
Carlos Mocholí d5ee8d8e3f
Disable `{save,check}_on_train_epoch_end` with `check_val_every_n_epoch>1` (#9156) 2021-09-03 14:27:44 +00:00
Carlos Mocholí 171d242a89
Add typing for `_FxValidator` [1/3] (#9269) 2021-09-03 13:41:05 +00:00
Carlos Mocholí f745aa9ce1
Move tracking epoch end outputs logic to the `EvaluationEpochLoop` (#9261) 2021-09-03 15:02:34 +02:00
Adrian Wälchli b91747ef75
remove backward from training batch loop (#9265) 2021-09-03 00:15:40 +00:00
Carlos Mocholí 1e08b044ec
Allow easy CLI trainer re-instantiation (#9241)
* Allow easy CLI trainer re-instantiation

* Update CHANGELOG

* Allow passing any trainer argument

* Do not modify the previous config
2021-09-03 00:56:30 +02:00
B. Kerim Tshimanga f0788b3bbc
scheduled removal of auto_move_data decorator (#9231)
* scheduled removal of auto_move_data decorator

* update CHANGELOG.md

* remove unused import

* remove test_decorators.py

* fix missed merge conflict

Co-authored-by: thomas chaton <thomas@grid.ai>
2021-09-03 00:54:36 +02:00
Himanshu Dutta 5fbf04a145
DataModule compatiblity with Python dataclass (#9039)
* added support and checks required for use of datamodule as python dataclass
* made changes required for dataclass support for LightningDataModule and required tests
* made the code compliant with future releases
* edited tests - removed training call. left dataclass decorator to defaults.
* added tests to check for multilevel inheritence and make sure init isn't called on the parent of defined class
* modified __new__ to ensure calling of init on LightningDataModule impliciltly
* added relevant tests for multilevel inheritence cases
* removed default values from tests

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-03 00:43:38 +02:00
Adrian Wälchli a5e2f2b432
fix state extraction from batch when fault-tolerant training (#9281) 2021-09-02 11:57:40 -07:00
Adrian Wälchli e802f519ea
Tighten the checks for `Trainer.terminate_on_nan` (#9190) 2021-09-02 18:35:22 +02:00
Adrian Wälchli 75350938ca
extract optimizer loop (#9191) 2021-09-02 12:40:05 +01:00
four4fish a451997c4d
Avoid wrapping LightningModule in DDP plugins when not fitting (#9096)
* Avoid wrapping LightningModule in DDP plugins when not fitting

* Avoid wrapping LightningModule in DDP plugins when not fitting
2021-09-02 02:23:59 +00:00
Pavel Grunt e2ecb8f859
Allow exporting to onnx when input is tuple (#8800)
Fixes #8799
2021-09-02 03:36:20 +02:00
B. Kerim Tshimanga 35876bb75f
remove lightning module datamodule property (#9233)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-09-02 00:43:47 +02:00
B. Kerim Tshimanga 65b3dc4495
scheduled removal of DeepSpeedPlugin.cpu_offload* parameters (#9244) 2021-09-01 12:02:30 +02:00
Danielle Pintz b046bd0670
Add on_exception callback hook (#9183) 2021-09-01 10:49:00 +02:00
Kaushik B f21f1bedf2
Deprecate `process_position` from the Trainer constructor (#9222) 2021-08-31 15:14:23 +00:00
B. Kerim Tshimanga f6614b370c
scheduled removal of BaseProfiler.output_filename in favor of dirpath… (#9214) 2021-08-31 09:30:43 +00:00
Soham Tiwari 861f8afeea
[bugfix] Changed CometLogger to stop modifying metrics in place (#9150)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-31 08:21:16 +00:00
B. Kerim Tshimanga 07ee8fc9a0
Remove deprecated property `ModelCheckpoint.period` in favor of `ModelCheckpoint.every_n_epochs` (#9213) 2021-08-31 10:04:29 +02:00
B. Kerim Tshimanga 34053ef85e
Remove deprecated `Trainer.running_sanity_check` (#9209) 2021-08-31 01:44:33 +02:00