* code rule
* Apply suggestions from code review
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* chlog
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* fixed undesired behaviour due to dict.fromkeys
* a test for log length consistency
* runtime-warn if no schedulers are configured
* chlog
* move
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* filter valid args
* error on unknown manual args
* added test
* changelog
* update docs and doctest
* simplify
* doctest
* doctest
* doctest
* better test with mock check for init call
* fstring
* extend test
* skip test on 3.6 not working
Co-authored-by: William Falcon <waf2107@columbia.edu>
* FixesPyTorchLightning/pytorch-lightning#490
`EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`.
In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`.
* Highlighted that ES callback runs on val epochs in docstring
* Updated EarlyStopping in rst doc
* Update early_stopping.py
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/early_stopping.rst
* fix doctest indentation warning
* Train loop calls early_stop.on_validation_end
* chlog
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* saves model every epoch
* implement test for save_last
* Update CHANGELOG.md
* Update CHANGELOG.md
* changes test description
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* Allow dataloaders without sampler field present
Sometimes we have a custom dataloader that doesn't have a sampler, better to check that the field is there before reading it.
* chlog
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* Add flag to `dump_checkpoint` for only including weights
`ModelCheckpoint` then passes `self.save_weights_only` to the save function.
* Fix tests and add changelog entry
* Add check and descriptive message when training state is restored from a weights only checkpoint
Also add a test for making sure `ModelCheckpoint.save_weights_only` works as expected.
* Fix weights-only test to properly match expected exception
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
The changes are quite local and limited in nature -- viz., checking for
some indicator environment variables. We check for (SLURM_LOCALID,
NODE_RANK, GROUP_RANK) in order. If multiple are found set, a warning is
logged.
This patch also fixes a minor bug with comparing the `WORLD_SIZE`
environment variable. This can be a string type.
* Fixed typing annotation by adding boolean type. After that Profiler flag will be added to argparse.
* Updated CHANGELOG.md
* Updated git_init_arguments_and_types() to pass doctests.
* Added doctest example to add_argparse_parser()
* Option to provide seed to random generators to ensure reproducibility
I added small function in utilities which imports torch, numpy, python
random and sets seed for all of the libraries to ensure reproducibility
of results.
* Apply recommendations from core contributors on seeding
1. Moved the seeding code to another file
2. Make deterministic as a parameter for trainer class
3. Add assertions for seeding numpy
4. Added warnings
5. torch.manual_seed should be enough for seeding torch
* Revert "Apply recommendations from core contributors on seeding"
This reverts commit a213c8e6882eec8a9e7408b9418926d2db7c5461.
* Revert "Revert "Apply recommendations from core contributors on seeding""
This reverts commit 59b2da53c62878de7aab0aa3feb3115e105eea06.
* Change in test, for correct seeding
* Allow seed equal to 0
* Allow seed to be uint32.max
* Added deterministic to benchmarks
* Cuda manual seed as in benchmark seeding
* Seeding should be done before model initialization
* cuda manual_seed is not necessary
* Fixing seed test_cpu_lbfgs
On some seeds seems like lbfgs doesn't converge.
So I fixed the seed during testing.
* rebasing issue with old reproducibility.py
* Improved documentation and ability to seed before initializing Train
class
* Change in docs
* Removed seed from trainer, update for documentation
* Typo in the docs
* Added seed_everything to _all_
* Fixing old changes
* Model initialization should be earlier then Trainer
* Update pytorch_lightning/trainer/__init__.py
From Example to testcode
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Fixing according to the contributors suggestions
* Moving horovod deterministic to Trainer class
* deterministic flag affects horovod docs update
* Improved static typing
* Added deterministic to test runners of horovod
It is failing on some versions, not very predictable
* static seeds for horovod tests
* Change for reset_seed function in tests
* Seeding horovod using reset_seed from tutils
* Update pytorch_lightning/trainer/__init__.py
* chlog
* Update trainer.py
* change "testcode" to "Example" in trainer init documentation
* Update pytorch_lightning/trainer/seed.py, first line in comment
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* missing
* RC
* tol
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update prog. bar metrics on train epoch end
* changelog
* wip test
* more thorough testing
* comments
* update docs
* move test
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* Fix Horovod backend to disable progress bar on all ranks except 0
* Add join barriers
* Added changelog
* Make protected and add verbosity
* Refactor to disable progress bar callback in train
* Removed vebose setting
* Add cache check for Horovod
* Test run again
* Updated comment
* Always skip cache for Horovod
* Only reinstall when necessary
* Added separate step
* Fixed spacing
* Skip Python 3.8
* params
* drop acc
* Fix Horovod distributed backend to set the root_gpu
* Fixed test
* Fixed tests
* Fixed lint
* Set root_gpu during initialization
* chlog
Co-authored-by: Jirka <jirka.borovec@seznam.cz>