* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* remove grad scaling tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* fix deprecation warnings
* added base tests for tpu
* added base tests for tpu
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
* added base tests for tpu
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* fix tpu hang
* no cov
* no cov
* ReduceOp
* group
* reduce_op.sum
* Update sklearns.py
* formatting
* horovod
* Apply suggestions from code review
* horovod
* horovod
* horovod
* horovod
* ci
* print
* ci
* timeout
* timeout
* time
* fix
* distributed cpu
* pipes
* time
* cpu
* spawn
* spawn
* spawn
* tp
* separate
* os
* os
* npm
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix meta tags creating empty lines
* pyright
* node
* fix httpserver address
* drop tutils.default_trainer_options
* imports
* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294)
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* drop duplicate
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
* move backward
* refactor backward to remove 16 bit from user override
* refactor backward to remove 16 bit from user override
* Update pytorch_lightning/core/hooks.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* First attempt at auto-moving data for inference
* Correct my copypaste errors
* Correct for if device is CPU
* Get rid of the WIP code I accidentally added
* Add tests
* Make tests more foolproof
* Make sure we stick with pep8 formatting
* Clarify docs a little
* Apply suggestions from code review
* Get everything working again hopefully
* refactor and added hook
variant a
variant b
add test
revert rename
add changelog
docs
* move changelog entry to top
* Move data transfer to utilities
* Add back in warnings for autotransfer
* Get rid of the test code I ended up accidentally commiting again
* Add docs any changelog
* Correct PR number in Changelog
* Correct changelog
* Update data.py
* Update test_cpu.py
* make a decorator
* type hint
* changelog
* changelog
* remove old function
* import
* test for decorator
* fix test
* remove old test
* doctest
* apply decorator directly
* convert doctest to code block
* prevent side effects in tests
* fix merge
* update forward docs
* update docs
* added docs in section "deployment / prediction"
* update changelog
Co-authored-by: Hengjian Jia <henryjia18@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* past checkpoints
* omegaConf save
* enforce type
* resolve=True
Co-authored-by: Omry Yadan <omry@fb.com>
* test omegaconf
* tests
* test past
Co-authored-by: Omry Yadan <omry@fb.com>
* allow loading checkpoints from urls
* tmpdir_server fixture
* test cases for loading checkpoints from url
* dir => root_dir
* default map_location to None
* test case for resume_from_checkpoint
* changelog
* doc update
* monkeypatch TORCH_HOME to avoid caching
* Use a threading server with random ports so that it is easier to clean up
* test fixes
* pep8 fix
* ThreadingHTTPServer support in 3.6
* pep8 fix
* fix changelog
* separate tests for urls
* typo
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* do not include local vars in auto collection
* add test
* add test for model with "self" renamed to "obj"
* skip decorator
* changelog
* changelog
* update docs
* remove obsolete child collection
* generalize **args, **kwargs names
* docs
* also update varargs passed in
* Revert "also update varargs passed in"
This reverts commit 3d7a30dbee07a513ee13e1cc3e08ca5ccdb85734.
* update test
* refactor and added hook
variant a
variant b
add test
revert rename
add changelog
docs
* resolve merge duplication
* overridden typo
* fix test
* tpu id
* raise if TPU not available
* re-use apply_to_collection function for parsing collections
* comment
* make utility function available to user
* documentation
* move changelog entry to top
* fix tpu transfer call
* fix call
* remove hardcoded string
* improve test
* call model hook by default
* Apply suggestions from code review
* rename utility function
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix grad norm formula
* grad-norm tracker test
* fixed seed and explicit rtol in grad norm tracking test
* a docstring for grad-norms and forced cast to float of norm_type
* support for inf-norm
* renamed the grad norm test
* docs
* fixed language in docstring
* Apply suggestions from code review
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
The changes are quite local and limited in nature -- viz., checking for
some indicator environment variables. We check for (SLURM_LOCALID,
NODE_RANK, GROUP_RANK) in order. If multiple are found set, a warning is
logged.
This patch also fixes a minor bug with comparing the `WORLD_SIZE`
environment variable. This can be a string type.