* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* Get experiment_id from MLFlow only once instead of each training loop.
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* add test that asserts mlflow client is called to retrieve experiment id only once
* make pep8 happy
* logs
Co-authored-by: Patrick Orlando <patrick.orlando@rea-group.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
* Add support to Tensorboard logger for OmegaConf hparams
Address https://github.com/PyTorchLightning/pytorch-lightning/issues/2844
We check if we can import omegaconf, and if the hparams are omegaconf instances. if so, we use OmegaConf.merge to preserve the typing, such that saving hparams to yaml actually triggers the OmegaConf branch
* avalaible
* chlog
* test
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
* fix weights_save path and drop ckpt_path
* add tests
* unused import
* update docs
* changelog
* pep8
* fix horovod test
* make backward compatible
* perform same test for all loggers
* fix for when logger=False and weights_save_path is set
* update changelog
* update docs
* update tests
* do not set save dir dynamically
* remove duplicate test
* remove duplicated tests
* update tests
* update tests
* remove remaining ckpt_path references
* move defaults to init as suggested by @Borda
* test deprecation
* mlflow rework
* logger save_dir
* folder
* mlflow
* simplify
* fix test
* add a test for file dir contents
* new line
* changelog
* docs
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* test for comet logger
* improve mlflow checkpoint test
* prevent commet logger error on pytest exit
* test tensorboard save dir structure
* wandb save dir test
* skip test on windows
* add mlflow to pickle tests
* wandb
* code factor
* remove unused imports
* remove unused setter
* wandb mock
* wip mock
* wip mock
* wandb tests with mocking
* clean up
* clean up
* comments
* include wandblogger in test
* clean up
* missing argument
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* fix deprecation warnings
* added base tests for tpu
* added base tests for tpu
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* fix and test for ddp block logging rank > 0
* rename
* use the dummy logger
* dummy logger test
* set the logger in model
* decorator for rank zero experiment
* simplify check
* simplify
* fix problem with None in checkpoint path
* revert configure logger
* unused import
* offline
* try rank 0 decorator in checkpoint
* try fix test
* imgs
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* fix tpu tests
* fix tpu tests
Co-authored-by: William Falcon <waf2107@columbia.edu>
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* no cov
* no cov
* ReduceOp
* group
* reduce_op.sum
* Update sklearns.py
* formatting
* horovod
* Apply suggestions from code review
* horovod
* horovod
* horovod
* horovod
* ci
* print
* ci
* timeout
* timeout
* time
* fix
* distributed cpu
* pipes
* time
* cpu
* spawn
* spawn
* spawn
* tp
* separate
* os
* os
* npm
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix meta tags creating empty lines
* pyright
* node
* fix httpserver address
* drop tutils.default_trainer_options
* imports
* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294)
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* drop duplicate
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* chlog
* deprecated
* deprecated
* deprecated
* tests
* tests
* Apply suggestions from code review
* tests
* hydra support
* tests
* hydra support
* hydra support
* hydra support
* tests
* typo
* typo
* Update test_dataloaders.py
* docs
* docs
* docs
* docs
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix(wandb): use same logger on multiple training loops
New training loops reset step to 0 which would previously try to overwrite logs
fix#2015
* docs(changelog.md): add reference to PR 2055
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* fix(wandb): allow use of sweeps
overwrite run config parameters due to precision error
fix#1290
* docs(wandb): update changelog
* test(wandb): update config test
Co-authored-by: William Falcon <waf2107@columbia.edu>
* The epoch was being logged to metrics, which isn't read, rather than to current_metrics.
* Updated the tests to account for the epoch arriving at the logger.