* allow loading checkpoints from urls
* tmpdir_server fixture
* test cases for loading checkpoints from url
* dir => root_dir
* default map_location to None
* test case for resume_from_checkpoint
* changelog
* doc update
* monkeypatch TORCH_HOME to avoid caching
* Use a threading server with random ports so that it is easier to clean up
* test fixes
* pep8 fix
* ThreadingHTTPServer support in 3.6
* pep8 fix
* fix changelog
* separate tests for urls
* typo
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* do not include local vars in auto collection
* add test
* add test for model with "self" renamed to "obj"
* skip decorator
* changelog
* changelog
* update docs
* remove obsolete child collection
* generalize **args, **kwargs names
* docs
* also update varargs passed in
* Revert "also update varargs passed in"
This reverts commit 3d7a30dbee07a513ee13e1cc3e08ca5ccdb85734.
* update test
* black
Added throught black.toml other options are hard so far
No caching for black github action
Moved from black.toml to pyproject.toml
Exclude not only yml but also yaml
Update pyproject.toml
Co-authored-by: Thomas Johansen <thomasjo@gmail.com>
Update .github/workflows/code-formatting-check.yml
mergify
Remove formating check
E231 error ignoring because of black formating
Updated CONTRIBUTING to the master
* Update .github/workflows/code-formatting-check.yml
* Bump black to 19.10b0 version
* resolved incorrect merge of CONTRIBUTING,
Black skipping string normalization
* Minor fixes in CONTRIBUTING, two typos
* Update setup.cfg
* chlog
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* refactor and added hook
variant a
variant b
add test
revert rename
add changelog
docs
* resolve merge duplication
* overridden typo
* fix test
* tpu id
* raise if TPU not available
* re-use apply_to_collection function for parsing collections
* comment
* make utility function available to user
* documentation
* move changelog entry to top
* fix tpu transfer call
* fix call
* remove hardcoded string
* improve test
* call model hook by default
* Apply suggestions from code review
* rename utility function
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* use parallel loader
* Revert "use parallel loader"
This reverts commit ed6e7583
* select tpu id for pl
* condition if tpu_id is None
* added info to changelog
* Revert "condition if tpu_id is None"
This reverts commit 1fb6e586
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix(wandb): use same logger on multiple training loops
New training loops reset step to 0 which would previously try to overwrite logs
fix#2015
* docs(changelog.md): add reference to PR 2055
* Add an additional attribute to ModelCheckpoint to keep track of the best model's path
Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.
* Add small description and usage example to docs
* Fix PEP8 issues
* Fix doctest example
* Fix expected output in doctest
* Apply suggestions from code review
* Show example as code block instead of doctest
* Apply suggestions from code review
* Update CHANGELOG.md
* Rename `ModelCheckpoint.best` to `ModelCheckpoint.best_model_score`
Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.
* Update pytorch_lightning/trainer/training_io.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add warning when loading checkpoint from an old version
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix chlog
* test for #1729
* hist
* update
* Document use case of passing test dataloaders to Trainer.test() (#1992)
* Issue 1990 Doc patch.
* Codeblock directive.
* Update to reflect current state of pytorch-lightning
* Final grammar cleaning. I hope these commits are squashed.
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: authman <uapatira@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* code rule
* Apply suggestions from code review
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* chlog
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* fixed undesired behaviour due to dict.fromkeys
* a test for log length consistency
* runtime-warn if no schedulers are configured
* chlog
* move
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* filter valid args
* error on unknown manual args
* added test
* changelog
* update docs and doctest
* simplify
* doctest
* doctest
* doctest
* better test with mock check for init call
* fstring
* extend test
* skip test on 3.6 not working
Co-authored-by: William Falcon <waf2107@columbia.edu>
* FixesPyTorchLightning/pytorch-lightning#490
`EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`.
In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`.
* Highlighted that ES callback runs on val epochs in docstring
* Updated EarlyStopping in rst doc
* Update early_stopping.py
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/early_stopping.rst
* fix doctest indentation warning
* Train loop calls early_stop.on_validation_end
* chlog
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* saves model every epoch
* implement test for save_last
* Update CHANGELOG.md
* Update CHANGELOG.md
* changes test description
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* Allow dataloaders without sampler field present
Sometimes we have a custom dataloader that doesn't have a sampler, better to check that the field is there before reading it.
* chlog
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* Add flag to `dump_checkpoint` for only including weights
`ModelCheckpoint` then passes `self.save_weights_only` to the save function.
* Fix tests and add changelog entry
* Add check and descriptive message when training state is restored from a weights only checkpoint
Also add a test for making sure `ModelCheckpoint.save_weights_only` works as expected.
* Fix weights-only test to properly match expected exception
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
The changes are quite local and limited in nature -- viz., checking for
some indicator environment variables. We check for (SLURM_LOCALID,
NODE_RANK, GROUP_RANK) in order. If multiple are found set, a warning is
logged.
This patch also fixes a minor bug with comparing the `WORLD_SIZE`
environment variable. This can be a string type.
* Fixed typing annotation by adding boolean type. After that Profiler flag will be added to argparse.
* Updated CHANGELOG.md
* Updated git_init_arguments_and_types() to pass doctests.
* Added doctest example to add_argparse_parser()
* Option to provide seed to random generators to ensure reproducibility
I added small function in utilities which imports torch, numpy, python
random and sets seed for all of the libraries to ensure reproducibility
of results.
* Apply recommendations from core contributors on seeding
1. Moved the seeding code to another file
2. Make deterministic as a parameter for trainer class
3. Add assertions for seeding numpy
4. Added warnings
5. torch.manual_seed should be enough for seeding torch
* Revert "Apply recommendations from core contributors on seeding"
This reverts commit a213c8e6882eec8a9e7408b9418926d2db7c5461.
* Revert "Revert "Apply recommendations from core contributors on seeding""
This reverts commit 59b2da53c62878de7aab0aa3feb3115e105eea06.
* Change in test, for correct seeding
* Allow seed equal to 0
* Allow seed to be uint32.max
* Added deterministic to benchmarks
* Cuda manual seed as in benchmark seeding
* Seeding should be done before model initialization
* cuda manual_seed is not necessary
* Fixing seed test_cpu_lbfgs
On some seeds seems like lbfgs doesn't converge.
So I fixed the seed during testing.
* rebasing issue with old reproducibility.py
* Improved documentation and ability to seed before initializing Train
class
* Change in docs
* Removed seed from trainer, update for documentation
* Typo in the docs
* Added seed_everything to _all_
* Fixing old changes
* Model initialization should be earlier then Trainer
* Update pytorch_lightning/trainer/__init__.py
From Example to testcode
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Fixing according to the contributors suggestions
* Moving horovod deterministic to Trainer class
* deterministic flag affects horovod docs update
* Improved static typing
* Added deterministic to test runners of horovod
It is failing on some versions, not very predictable
* static seeds for horovod tests
* Change for reset_seed function in tests
* Seeding horovod using reset_seed from tutils
* Update pytorch_lightning/trainer/__init__.py
* chlog
* Update trainer.py
* change "testcode" to "Example" in trainer init documentation
* Update pytorch_lightning/trainer/seed.py, first line in comment
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* missing
* RC
* tol
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update prog. bar metrics on train epoch end
* changelog
* wip test
* more thorough testing
* comments
* update docs
* move test
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* Fix Horovod backend to disable progress bar on all ranks except 0
* Add join barriers
* Added changelog
* Make protected and add verbosity
* Refactor to disable progress bar callback in train
* Removed vebose setting
* Add cache check for Horovod
* Test run again
* Updated comment
* Always skip cache for Horovod
* Only reinstall when necessary
* Added separate step
* Fixed spacing
* Skip Python 3.8