* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* cleaning up stale logger tests
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* tests to ensure correct dataloading interval and sequence
* Fix shuffle for distributed sampler
* add test
* test
* chlog
* update test
* update test
* update test
* assertions via callback
* define callback outside for pickling
* skip ddp test on windows
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix fast_dev_run to run for all val_dataloaders
* fast_dev_run check
* changelog
* explicit
* limit_batches with fast_dev_run in init
* add test
* whitespace and comment fix
* comment and assertion
* added tests
* Fix fast_dev_run to run for all val_dataloaders
* fast_dev_run check
* changelog
* explicit
* limit_batches with fast_dev_run in init
* add test
* whitespace and comment fix
* comment and assertion
* added tests
* added tests
* added tests
* added tests
* update rtol
* Revert "update rtol"
This reverts commit 4320329540.
* added tests
Co-authored-by: William Falcon <waf2107@columbia.edu>
* add tests for single scalar return from training
* add tests for single scalar return from training
* add tests for single scalar return from training
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* fixing val step only
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* fix deprecation warnings
* added base tests for tpu
* added base tests for tpu
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* no cov
* no cov
* ReduceOp
* group
* reduce_op.sum
* Update sklearns.py
* formatting
* horovod
* Apply suggestions from code review
* horovod
* horovod
* horovod
* horovod
* ci
* print
* ci
* timeout
* timeout
* time
* fix
* distributed cpu
* pipes
* time
* cpu
* spawn
* spawn
* spawn
* tp
* separate
* os
* os
* npm
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix meta tags creating empty lines
* pyright
* node
* fix httpserver address
* drop tutils.default_trainer_options
* imports
* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294)
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* drop duplicate
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
* deal with NotImplementedError raised by torchtext
* deal with NotImplementedError raised by torchtext
* Added tests for dataloader which raise NotImplementedError in __len__()
* Fixed some typos
* enabled tests for dataloader raising NotImplementedError in __len__ and corrected match string for raised exception
* deleted empty line for style compliance
* refactored CustomNotImplementedErrorDataloader to derive from CustomInfDataloader
* enabled reduced number of not_implemented_error dataloader test to reduce runtime for continuous integration
* reduced test number of not_implemented_error dataloader test further to reduce test time
* reduced test number of not_implemented_error dataloader test to one to reduce test time
* disabled all not_implemented_error dataloader test to see if test pass in time
* added __next__ with a reduced number (5) of elements after which CustomNotImplementedErrorDataloader stops to speedup test.
* enabling all not_implemented_error dataloader test
* added brief description of change and relation of torchtext
* CustomNotImplementedErrorDataloader reduced number of batches served to 2.
* Update CHANGELOG.md
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Disable parallelism in dataloader
Suspect that it might cause pytest to hang more frequent
* added max_steps=None to Trainer in not_implemented_error dataloader tests
* rearranged not_implemented_error test in file to group them together
* disabled parallel data loading
Reason: testing if that stops the test framework from hanging.
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Init fix num_batches
* Fix num_batches in case of multiple dataloaders
* Apply suggestions from code review
* Changes based on suggestions
* Flake8
* Add test to check num_batches
* generalize dataloader percent check test
* fix formatting
* remove hparams
* tests
* CHANGELOG
* Update CHANGELOG.md
* max_batches can be int
* conflict and rebase
* add back the test
fix
fix message
0.0 works
Revert "fix message"
This reverts commit 839cacf8b8610f4e697e654ef6f3d2501bf23984.
* update changelog
* Update CHANGELOG.md
* Fix num batches in case of multiple dataloaders and percent_check (#1920)
* git conflict
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* missing union
* doc update suggestion by @rohitgr7
* extend test
* changelog
* docs add note about multiple loaders
* update changelog
* remove unused variable
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* deal with NotImplementedError raised by torchtext
* deal with NotImplementedError raised by torchtext
* Added tests for dataloader which raise NotImplementedError in __len__()
* Fixed some typos
Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu>
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* chlog
* deprecated
* deprecated
* deprecated
* tests
* tests
* Apply suggestions from code review
* tests
* hydra support
* tests
* hydra support
* hydra support
* hydra support
* tests
* typo
* typo
* Update test_dataloaders.py
* docs
* docs
* docs
* docs
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add ckpt_path option to LightningModule.test()
If ckpt_path is "best" (default), it loads the best weights saved by ModelCheckpoint for the test loop.
If ckpt_path is a path to a checkpoint file, it loads the weights from the file for the test loop.
If ckpt_path is None, it uses the weights from the end of training for the test loop.
If model parameter is set, ckpt_path is ignored.
* Update test_set.rst
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Raise an error when lightning replaces an existing sampler
Currently, Trainer replaces the existing sampler with DistributedSampler
if running distributing training and `replace_sampler_ddp=True` (default
behaviour). If a user has configured an existing sampler, this would
lead to widely different results if running a distributed vs
non-distributed training.
This PR fixes this by raising an Error if user has configured a sampler
and uses `replace_sampler_ddp=True`. The recommended behavior from now
on is to either remove the sampler or set `replace_sampler_ddp=False`
* Fix tests
* Simpler fix
* Fix tests
* Make inner method protected
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix