* add ClusterEnvironment for LSF systems
* update init file
* add available cluster environments
* clean up LSFEnvironment
* add ddp_hpc as a distributed backend
* clean up SLURMEnvironment
* remove extra blank line
* init device for DDPHPCAccelerator
We need to do this so we don't send the model to the same device from multiple ranks
* committing current state
* add additional methods to ClusterEnvironments
* add NVIDIA mixin for setting up CUDA envars
* remove troubleshooting prints
* cleanup SLURMEnvironment
* fix docstring
* cleanup TorchElasticEnvironment and add documentation
* PEP8 puts a cork in it
* add set_ranks_to_trainer
* remove unused import
* move to new location
* update LSF environment
* remove mixin
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* changelog
* reset slurm env
* add tests
* add licence
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* test node_rank
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add lsf env to docs
* add auto detection for lsf environment
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix is_using_lsf() and test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add kubeflow cluster environment
* Add KubeflowEnvironment to docs
* Add KubeflowEnvironment to the changelog
* break up a long line
* Add method to detect kubeflow environment
* Select Kubeflow environment when available
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Run pre-commit
* task_idx == 0
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>