Commit Graph

47 Commits

Author SHA1 Message Date
Carlos Mocholí c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start (#9780) 2021-10-18 14:02:16 +02:00
Rohit Gupta 4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816)
* deprecate hooks

* dep todo

* explicit

* Apply suggestions from code review

* Apply suggestions from code review

* code review

* base
2021-10-07 10:18:11 +00:00
thomas chaton 5841ca9782
[Feat] Add auto_restart for fault tolerant training (#9722) 2021-10-01 16:37:17 +00:00
thomas chaton fa44dbcd9e
[Refactor] Simplify data loading logic around replacing sampler to prevent confusion (#9721)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-28 17:04:02 +00:00
Carlos Mocholí 198aa852ef
Remove `training_epoch_end` outputs check (#9719)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-28 14:21:46 +00:00
Carlos Mocholí bc50591d49
reduce loop structure leakage into the `TrainingEpochLoop` (#9490)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-28 13:22:22 +00:00
Adrian Wälchli 5395cebc51
move get_active_optimizers to utilities (#9581) 2021-09-25 13:17:47 +02:00
Carlos Mocholí d02fc2b728
Rename `reset_on_epoch` to `reset_on_run` (#9658) 2021-09-25 04:27:54 +02:00
Carlos Mocholí ce00053002
Support skipping to validation (#9681)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-24 14:10:25 +00:00
Carlos Mocholí 8dcba38e0e
Add `is_last_batch` to progress tracking (#9657) 2021-09-23 12:54:41 +00:00
thomas chaton 89ab2470c1
[Refactor] 1/2 Move reset_on_restart within the loop reset (#9561)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-17 16:11:32 +00:00
Adrian Wälchli 5a846d48ce
mark several methods in evaluation loops as protected (#9516)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-15 14:12:27 +00:00
Carlos Mocholí 23450e2905
Add custom logic to each `OutputResult` subclass [2/2] (#9424) 2021-09-15 12:18:19 +00:00
Carlos Mocholí b1ed1db089
Keep global step update in the loop (#8856) 2021-09-14 19:21:39 +05:30
Carlos Mocholí 48d3a10c9b
Add `OutputResult` [1/2] (#9437)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-14 15:48:27 +02:00
Adrian Wälchli 6ff43cbff7
fix resuming from checkpoint for fault-tolerant in case of no failure (#9371)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-09-10 17:25:46 +00:00
Carlos Mocholí 9eccb3148e
Loop and test restructuring (#9383) 2021-09-10 13:18:24 +00:00
Carlos Mocholí e0f2e041b9
Share the training step output data via `ClosureResult` (#9349) 2021-09-10 11:40:20 +00:00
ananthsub c963bf6568
[loops] Reset reference to dataloader iterator on run end (#9386)
* [loops] Reset reference to dataloader iterator on run end
2021-09-10 04:18:58 +00:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Carlos Mocholí 05ff1b2085
Remove unnecessary `TrainingEpochLoop` return (#9298) 2021-09-06 13:54:33 +02:00
Adrian Wälchli 9a14f04322
Fix mypy typing errors in optimizer loop (#9317) 2021-09-06 13:54:07 +02:00
Eric Wiener cf1a589956
Support infinite training (#8877)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-04 23:33:43 +00:00
Adrian Wälchli 75350938ca
extract optimizer loop (#9191) 2021-09-02 12:40:05 +01:00
thomas chaton f959b13ab9
3/n inter batch parallelism (#9052)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-24 18:45:54 +00:00
thomas chaton 92c7eec966
2/n inter batch parallelism (#9047)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-23 19:30:44 +00:00
thomas chaton e9ce598f2b
1/n inter batch parallelism (#9020)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-23 13:12:25 +00:00
Yifu Wang 938a191406
Add a flavor of training_step that takes dataloader_iter as an argument (#8807)
* Add a flavor of training_step that takes dataloader_iter as an argument
2021-08-16 19:01:09 +00:00
Carlos Mocholí d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911) 2021-08-16 13:34:42 +02:00
Carlos Mocholí 0aa5cc7b77
Integrate `total_batch_idx` with progress tracking (#8598) 2021-08-14 14:08:34 +02:00
Adrian Wälchli 4b6aaeeae3
fix plateau scheduler stepping on incomplete epoch (#8861) 2021-08-13 01:35:52 +00:00
Carlos Mocholí 5789e9f5e4
Fix reference issues during epoch end result collection (#8621)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-30 12:16:47 +00:00
Carlos Mocholí c2199fbbee
Fix `trainer.fit_loop.split_idx` reference (#8601)
* Fix split idx reference

* Update CHANGELOG

* Add comment
2021-07-29 08:00:04 +00:00
Carlos Mocholí 47c47faeae
Remove `outputs` in `on_train_epoch_end` hooks (#8587) 2021-07-28 18:27:54 +02:00
Carlos Mocholí 7914e494dd
Replace `iteration_count` and other index attributes in the loops with progress dataclasses (#8477)
* Delete `iteration_count` and `batches_seen`

* Update CHANGELOG

* Protect should accumulate

* Update pytorch_lightning/loops/epoch/training_epoch_loop.py
2021-07-27 18:36:20 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Adrian Wälchli 7d93d70110
Loop specialization (#8226)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-19 15:08:53 +02:00
thomas chaton 7bb810f143
Add progress tracking on Loops - 2/n (#8362)
* resolve issues

* update

* update

* update

* add more exceptions

* resolve bug

* update

* update

* update changelog

* resolve bug

* resolve comments

* update

* update

* update changelog

* update

* update

* remove space

* update

* add progress tracking to loops

* validate json

* update

* convert to dict for better readability

* validate reload

* update

* update

* update on comments

* remove deadcode

* clean changelog

* clean changelog

* update

* update on comments

* CHANGELOG

* CHANGELOG

* Update pytorch_lightning/loops/base.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* whitespace suggestions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make fault_tolerant_enabled protected

* whitespace fixes around Args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* typo it's -> its

* fix copy-paste typo in progress docstring

* Delete classes

* Minor change

* docs

* protected get_loops_state

* merge restore_loops with restore_progress

* Fix tests after removals

* explicit save with trainer.save_checkpoint()

* handle optimization restart based on optimizer_idx

* update increments

* update val batch progress and remove iteration count

* update progress tracking for dataloader loops

* remove self.dataloader_idx from eval_epoch_loop

* add batch progress to predict loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate progress tracking for current_epoch

* Fix test

* Actually remove it

* Remove unused TrainingEpochProgress

* Fix optimization progress - missing scheduler

* Restarting changes

* Scheduler progress

* Unused property, reset on epoch

* Resolve FIXME

* Remove FIXME

* fix test_progress (wip)

* fix batch_progress.current.reset

* Hold off on split progress. Out of scope of this PR

* Unnecessary if

* fix structure in test_progress

* structure

* clean up unused variables in test_progress

* refactor naming and organization in test_progress

* Unnecessary variable

* Remove unnecessary diff

* Improve comment

* Undo typing change to avoid polluting everything with mypy fixes

* Fix and improve test_loops.py

* Fix and organize `test_loop_state_dict`

* Remove unnecessary checks in test

* Update test after disallowing updates on None attributes

* Typing

* Minor test cleanup

* Fix and move loop test

* Move test from progress to loops

* Reset the scheduler progress

* SchedulerProgress fix

* Consistent whitespace

* Fix final test

* Minor test changes

* One test to rule them all

* Formatting

* Rename and clean variables

* Shorter names

* Shorter scheduler name

* Fix optimizer step calculation for stop_batch=2

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove empty connects

* Update CHANGELOG

* Holy shit finally got the formula right

* Fix final thing!!!

* Do not check state dicts

* parametrize multiple_dataloader progress test

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
2021-07-19 08:31:45 +00:00
Carlos Mocholí 7d1f4ce718
Move plateau schedulers epoch update to the training epoch loop (#8424) 2021-07-15 19:49:27 +02:00
Carlos Mocholí 0cd406d4f1
Delete `checkpoint_connector.has_trained` (#8292) 2021-07-07 17:47:35 +01:00
Carlos Mocholí 3379477242
Connect progress tracking dataclasses to loops (#8244)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-05 13:33:12 +02:00
Adrian Wälchli ea5cfd2005
move batch to device before sending it to hooks (#7378)
* update train step

* test

* x

* limits

* val

* typeo

* x

* x

* step

* min gpus

* run all loops

* x

* limit test

* profiler

* clean up accelerator code

* move files

* rename

* move tests

* changelog

* reorder callbacks and model hooks

* add test description

* replace unneccessary method

* fix chlog

* adjust batch_to_device for DP Plugin

* update tests for dataloader idx

* unused imports

* hook change

* switch None

* clear memory

* change to None

* None

* None

* memory savings

* remove redundant todo

* hack

* cheat

* Revert "cheat"

This reverts commit a8433bd0b4.

* Revert "hack"

This reverts commit 43a6d1edeb.

* update new epoch loop

* remove from old loop code

* update chlog

* update hook test

* changelog

* teardown

* integrate changes in new eval loop

* fix hook calls

* add prediction step

* bad merge

* Revert "bad merge"

This reverts commit 488080863c.

* fix train batch hook test

* rm -rf _notebooks

* update chlog

* release memory

* fix type

* notebooks mess

* debug

* Revert "debug"

This reverts commit eec4ee2f77.

* teardown

* fix teardown bug

* debug

* x

* debug

* Revert "debug"

This reverts commit a6e6101946.

Revert "debug"

This reverts commit 5ddeaec069.

debug


debug


Revert "debug"

This reverts commit 605be746f7daedf265b2c05a1c153ce543394435.

Revert "Revert "debug""

This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8.

debug


x


x


x


s


tol


x


tol

* Fix changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-05 09:31:39 +01:00
Carlos Mocholí 0e19d16ca6
Move result teardown to loops (#8245)
* Move result teardown to loops

* Update CHANGELOG

* Remove teardown from run

* Move previous teardown to on_run_end

* Add comment

* Merge 8250

* Remove stage set to None where it shouldnt
2021-07-02 14:36:14 +01:00
Carlos Mocholí a2e41045d2
Mark some loop attributes as protected (#8250) 2021-07-02 11:51:51 +01:00
thomas chaton d51b0ae7fc
Add `state_dict` to loops (#8197)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-01 15:54:37 +00:00
Adrian Wälchli 6db0fe3659
training loop refactor - move val loop (#8120)
* EvaluationDataLoaderLoop -> EvaluationLoop

* proposed rename files

* imports

* bad merge

* update init files

* glue imports together

* rename fit_loop.validation_loop to fit_loop.val_loop

* move loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Group imports

* Resolve circular import

* Comment

* fix test

* try to resolve circ import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-29 09:06:44 +00:00
Adrian Wälchli 55a90af7fc
`pytorch_lightning.loops` file structure: group by dataloader, epoch, and batch loop (#8077) 2021-06-24 23:40:46 +02:00