* add test
* resolve bug
* udpate test
* wrongly copy / paste
* update test
* resolve a second bug
Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
* Fix info message when EarlyStopping 'mode' not provided
* fixup! Fix info message when EarlyStopping 'mode' not provided
* Apply suggestions from code review
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
* tpu device check
* replaced with xmp spawn
* Revert "replaced with xmp spawn"
This reverts commit 6835380f
* replaced all instances of XLA_AVAILABLE
* moved inner_f to global scope
* made refactors
* added changelog
* added TPU_AVAILABLE variable
* fix codefactor issues
* removed form trainer and early stopping
* add TORCHXLA_AVAILABLE check
* added tests
* refactoring
* Update pytorch_lightning/utilities/xla_device_utils.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* updated function names
* fixed bug
* updated CHANGELOG.md
* added todo
* added type hints
* isort and black
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* ref: fix metric err
* ref: fix metric err
* ref: fix metric err
* ref: merge
* ref: merge
* ref: merge
* ref: merge
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)
* force crash when max_epochs < epochs in a checkpoint
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Fix ModelCheckpoint's name formatting
* Fix failing tests
* Add dot to CHECKPOINT_SUFFIX
* Set variables to their default values at the end of tests
* Fix logic for filepath='' and filename=None. Add test
* Fix Windows tests
* Fix typo. Remove leading line break and zeroes
* Remove CHECKPOINT_SUFFIX
* Fix typos. Use appropriate f-string format
* Apply suggestions from code review
* Fix broken tests after #3320
* Finish changes suggested by Borda
* Use explicit test var names
* Apply suggestions
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Apply suggestions
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update CHANGELOG
* Apply suggestions from code review
* for
* prepend whitespace in warn msg
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fixes the test for early stopping without val step.
The expression which checked, if early stopping was triggered, had an off-by-one error and hence was true even if early stopping was not triggered.
Furthermore set patience to 0 and max epochs to 10, to ensure loss has enough time to flatten.
* Fixes early stopping without val step.
The issue has been, that only `early_stop_on` key was checked and not an arbitrary monitor key.
* Fixes branch, which checks whether early stopping is done during validation.
Before only `val_early_stop_on` was checked. Since arbitrary keys can be used, the set of possible validation keys cannot be exhaustive. Hence this disables "early stopping on_train_epoch_end" via an instance attribute if early stopping was executed in on_validation_epoch_end.
Furthermore adds a test, which ensures arbitrary keys work.
* Improve check whether eval results are used.
Only disable early checking with train results if eval results are actually used. Before they were always disabled in ``on_validation_epoch_end``.
Rename and document instance variable, to make it more clear.
* Remove wrong documentation on behaviour of early stopping with train result' dict.
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* change t() to transpose() as xla devices do not support .t() on 1-dim tensor
* detach tensor before copying
* Revert "detach tensor before copying"
This reverts commit 37cc7bbe
* changed dims
* added test_result_obj_on_tpu
* detach before copying
* detach before copying
* detach before copying
* replace torch.cat with sum
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* Fix typo
* ref: group prepare data hook (6) (#3212)
* group prepare data hook
* group prepare data hook
* group prepare data hook
* group prepare data hook
* group prepare data hook
* group prepare data hook
* group prepare data hook
* Fix typo
Co-authored-by: William Falcon <waf2107@columbia.edu>
* added warning when changing monitor and using results obj
* added warning when changing monitor and using results obj
* added warning when changing monitor and using results obj
* r
* r
* r
* patched optimizer closure with sr
* patched optimizer closure with sr
* patched optimizer closure with sr
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added train step structured result
* added autoreduce for train step
* added auto reduce on train
* added auto reduce on train
* added auto reduce on train
* added auto reduce on train
* added auto reduce on train
* added auto reduce on train
* added hooks
* added hooks
* added hooks
* added hooks
* added hooks
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* cache
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* Update pytorch_lightning/callbacks/early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
* Update pytorch_lightning/core/step_result.py
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* simple
* finished tests for structured results on train epoch
* simple
* simple
* revert
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* Update tests/base/deterministic_model.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* finished tests for structured results on train epoch
* docstring typos
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* finished tests for structured results on train epoch
* Update pytorch_lightning/core/step_result.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/overrides/data_parallel.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Patch for issue 1815, which will allow EarlyStopping to work on precision=16
* Added a whitespace to the end of the line so CICD can rerun. No reason for the latest macos test to have been cancelled.
* Format.
* FixesPyTorchLightning/pytorch-lightning#490
`EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`.
In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`.
* Highlighted that ES callback runs on val epochs in docstring
* Updated EarlyStopping in rst doc
* Update early_stopping.py
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/early_stopping.rst
* fix doctest indentation warning
* Train loop calls early_stop.on_validation_end
* chlog
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>