lightning

History

Andrew Tritt 3102922647 Add LSF support (#5102 ) * add ClusterEnvironment for LSF systems * update init file * add available cluster environments * clean up LSFEnvironment * add ddp_hpc as a distributed backend * clean up SLURMEnvironment * remove extra blank line * init device for DDPHPCAccelerator We need to do this so we don't send the model to the same device from multiple ranks * committing current state * add additional methods to ClusterEnvironments * add NVIDIA mixin for setting up CUDA envars * remove troubleshooting prints * cleanup SLURMEnvironment * fix docstring * cleanup TorchElasticEnvironment and add documentation * PEP8 puts a cork in it * add set_ranks_to_trainer * remove unused import * move to new location * update LSF environment * remove mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * reset slurm env * add tests * add licence * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test node_rank * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lsf env to docs * add auto detection for lsf environment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix is_using_lsf() and test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2021-07-09 16:14:26 +02:00
..
environments	Add LSF support (#5102 )	2021-07-09 16:14:26 +02:00
__init__.py	enable ddp as a plugin (#4285 )	2020-10-22 05:15:51 -04:00
test_amp_plugins.py	Add the `on_before_optimizer_step` hook (#8048 )	2021-07-09 13:30:52 +02:00
test_cluster_integration.py	Nuke RPC (#8101 )	2021-06-23 18:31:13 +00:00
test_custom_plugin.py	Add typings for evaluation_loop.py and remove some dead code (#7015 )	2021-04-15 07:36:04 +00:00
test_ddp_fully_sharded_with_full_state_dict.py	FSDP with full state dict (#7487 )	2021-05-24 08:11:45 +01:00
test_ddp_plugin.py	fix NCCL error with non-consecutive trainer gpus (#8165 )	2021-06-28 22:08:10 +02:00
test_ddp_plugin_with_comm_hook.py	`TrainerState` refactor [5/5] (#7173 )	2021-05-04 12:50:56 +02:00
test_ddp_spawn_plugin.py	Add `add_to_queue`/`get_from_queue` for DDP spawn(#7916 )	2021-06-23 03:19:37 +02:00
test_deepspeed_plugin.py	Add the `on_before_backward` hook (#7865 )	2021-07-09 06:15:57 +00:00
test_double_plugin.py	Fix double precision casting complex buffers (#8208 )	2021-06-30 10:57:42 +01:00
test_plugins_registry.py	Add `tpu_spawn_debug` to plugin registry (#7933 )	2021-06-15 22:32:51 +00:00
test_sharded_plugin.py	Fix Special Tests (#7841 )	2021-06-16 19:39:03 +02:00
test_single_device_plugin.py	Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes	2021-06-08 09:04:16 -04:00
test_tpu_spawn.py	Standardize positional datamodule and argument names (#7431 )	2021-06-15 11:50:13 +00:00