* Skip test due to 'Python bus error'
* Debug NCCL
* Remove NCCL_DEBUG statement
* Revert "Skip test due to 'Python bus error'"
This reverts commit e0a3e8785d.
* fix
* add test
* changelog
* yapf
* patch os environ
* make a special test
* destroy pg
* debug
* revert
* revert
* problematic test
* skip
* try the fixture
* test
* update sensitive test
* update changelog
* remove comment
* update wrong test
* update test name
* parameterization
* Revert "parameterization"
This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.
* remove conftest
* ignore test
* teardown
* fix merge
* deep speed parameterization
* uncomment test
* update chlog
* update changelog
* split tests
* update test
update test
update test
update test
* update test comments
* unroll test
* unroll test
* unroll test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* increase shm
* sudo
* unroll ipu
* Revert "sudo"
This reverts commit 6cc68c1478.
* Revert "increase shm"
This reverts commit 8c27163483.
* x
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* find guilty test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* POPTORCH_WAIT_FOR_IPU=1
* move test
* redo parameterize for ipu
* de-comment test
* move chlog
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* add extra test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add computation
* Update docs/source/common/trainer.rst
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update docs/source/common/trainer.rst
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update tests/trainer/test_dataloaders.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* use tmpdir
* update on comments
* update
* Update tests/callbacks/test_progress_bar.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Check progress bar existence before printing
* Add tests for predict_progres_bar
* Add tests for progress_bar printing without training
* Update changelog
* Update code
Co-authored-by: EliaCereda
* More property updates
* Move properties. Introduce trainer._fitting
* Use trainer.fitting
* Fix reset dataloaders
* Unused code
* RunningStage.SANITY_CHECKING
* Use setters
* Fix bugs
* Fix bugs
* TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}
* Fix bugs
* Fix bugs
* Fix tests
* Update CHANGELOG. Add deprecation warning. Fix tests
* Unused imports
* Optional trainer
* More deprecation. More refactoring
* Correct version
* Use properties
* Address comments
* flake8
* Missed renamings
* Typo
* is -> ==
It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str
* Also for tests
* Typo
* Address @tchaton's comments
* PEP8
* Correct property
* Update CHANGELOG
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Remove called sanity check
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Use high refresh rate on Google Colab (#3786)
Automatically override progress_bar_refresh_rate when on Google
Colab. Also added a constant IS_COLAB in utilities to check
whether it is being run in colab or not.
(#3786)
* Show a warning instead of overriding when rate is low on colab
* Change warning to suggestion and move it
Moved warning to configure_progress_bar instead of on_trainer_init
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* add a mock test
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Add dirpath and filename parameter in ModelCheckpoint
* remove old function
* chlog
* codefactor
* update tests
* docs
* fix doctest and added tests
* pathlib dirpath
* dep version and docs
* try fix doctest
* pep
* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>
* suggestions
* fix test
* pep
* trigger tests
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* suggestions
* try fix windows test
* add and update some tests
* trigger tests
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Fix val_progress_bar total with num_sanity_val_steps
* chlog
* Fix val_progress_bar total with num_sanity_val_steps
* move test
* replaced with sanity flag and suggestions
* Fix fast_dev_run to run for all val_dataloaders
* fast_dev_run check
* changelog
* explicit
* limit_batches with fast_dev_run in init
* add test
* whitespace and comment fix
* comment and assertion
* added tests
* Fix fast_dev_run to run for all val_dataloaders
* fast_dev_run check
* changelog
* explicit
* limit_batches with fast_dev_run in init
* add test
* whitespace and comment fix
* comment and assertion
* added tests
* added tests
* added tests
* added tests
* update rtol
* Revert "update rtol"
This reverts commit 4320329540.
* added tests
Co-authored-by: William Falcon <waf2107@columbia.edu>
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* chlog
* deprecated
* deprecated
* deprecated
* tests
* tests
* Apply suggestions from code review
* tests
* hydra support
* tests
* hydra support
* hydra support
* hydra support
* tests
* typo
* typo
* Update test_dataloaders.py
* docs
* docs
* docs
* docs
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* squash and rebase
sanity check hooks
sanity check callback hook finish
moved core progress bar functionality into callback
wip
remove duplicate merge
clean up
imports
docs
sanity check progress bar main
sanity
move callback calls
init progrss bar callback
configuration and docs
changelog
rate decorator
pass process_position
disable on rank > 0
position index
is_enabled
remove decorator
refactor init tqdm bars
callback method ordering
cannot reset when disabled
sequence -> list
default values
fix has no attr _time()
move on_val_end to proper place
fix the pickle issue
update warning
properties
check for None
remove old comment
switch order
pull out non-tqdm functionality into base class
documentation for the base class
docs
fix refresh rate issue in validation
restrict type hint of trainer arg
more docs
update trainer docs
rst docs
fix lines too long
fix test
add missing type hints
fix typo
move docstring to __init__ solves doctest failures
remove doctest :(( can't fix the pickle error
fix example
simplify by saving trainer reference
fix docs errors
move docstring
initial value
multiple val checks per epoch
simpler handling of inf dataset sizes
update inf docs
renamed training_tqdm_dict
rename get_tqdm_dict
rename occurences of tqdm
update changelog
fix doctest
fix formatting errors
added callback tests
progress bar on off test
more tests for progress bar
weird test fix?
add ignored property
disable default progress bar in LR finder
change enable/disable behavior
trying doctest in CI again
undo doctest pickle error
undo doctest pickle error :((
remove progress_bar_callback Trainer arg and fix tests
restore progress bar after auto lr find
update docs
fix rebase
fix wrong negation
* fix fast dev run total
* more thorough testing
* remove old args
* fix merge
* fix merge
* separate tests
* type hint total batches
* reduce if
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* is_disabled
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* is_enabled
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* rename enabled/disabled
* move deprecated api
* remove duplicated test from merge
* fix rename is_disabled
* newline
* test also testprogress for fast dev run
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>