lightning/tests/plugins
Andrew Tritt 3102922647
Add LSF support (#5102)
* add ClusterEnvironment for LSF systems

* update init file

* add available cluster environments

* clean up LSFEnvironment

* add ddp_hpc as a distributed backend

* clean up SLURMEnvironment

* remove extra blank line

* init device for DDPHPCAccelerator

We need to do this so we don't send the model to the same device from multiple ranks

* committing current state

* add additional methods to ClusterEnvironments

* add NVIDIA mixin for setting up CUDA envars

* remove troubleshooting prints

* cleanup SLURMEnvironment

* fix docstring

* cleanup TorchElasticEnvironment and add documentation

* PEP8 puts a cork in it

* add set_ranks_to_trainer

* remove unused import

* move to new location

* update LSF environment

* remove mixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changelog

* reset slurm env

* add tests

* add licence

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test node_rank

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lsf env to docs

* add auto detection for lsf environment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix is_using_lsf() and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-09 16:14:26 +02:00
..
environments Add LSF support (#5102) 2021-07-09 16:14:26 +02:00
__init__.py enable ddp as a plugin (#4285) 2020-10-22 05:15:51 -04:00
test_amp_plugins.py Add the `on_before_optimizer_step` hook (#8048) 2021-07-09 13:30:52 +02:00
test_cluster_integration.py Nuke RPC (#8101) 2021-06-23 18:31:13 +00:00
test_custom_plugin.py Add typings for evaluation_loop.py and remove some dead code (#7015) 2021-04-15 07:36:04 +00:00
test_ddp_fully_sharded_with_full_state_dict.py FSDP with full state dict (#7487) 2021-05-24 08:11:45 +01:00
test_ddp_plugin.py fix NCCL error with non-consecutive trainer gpus (#8165) 2021-06-28 22:08:10 +02:00
test_ddp_plugin_with_comm_hook.py `TrainerState` refactor [5/5] (#7173) 2021-05-04 12:50:56 +02:00
test_ddp_spawn_plugin.py Add `add_to_queue`/`get_from_queue` for DDP spawn(#7916) 2021-06-23 03:19:37 +02:00
test_deepspeed_plugin.py Add the `on_before_backward` hook (#7865) 2021-07-09 06:15:57 +00:00
test_double_plugin.py Fix double precision casting complex buffers (#8208) 2021-06-30 10:57:42 +01:00
test_plugins_registry.py Add `tpu_spawn_debug` to plugin registry (#7933) 2021-06-15 22:32:51 +00:00
test_sharded_plugin.py Fix Special Tests (#7841) 2021-06-16 19:39:03 +02:00
test_single_device_plugin.py Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes 2021-06-08 09:04:16 -04:00
test_tpu_spawn.py Standardize positional datamodule and argument names (#7431) 2021-06-15 11:50:13 +00:00