lightning

History

shuyingsunshine21 299f2c481b FSDP with full state dict (#7487 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * fix version for ddp plugin test * fix * fix * changelog * Update CHANGELOG.md * fsdp with full state dict * fix missing import * modify unitest * fix * fix * fix typo * modify test and add changelog * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * limit max_epoch to 1 for testing * test * fix * update * testing remove special for multi gpu * assert gpu * add assertion for gpu * fix * Re-enable special test, use ModelCheckpoint * Fix paths * Fix path passing * test * test * fix test * fix * pre-commit format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>		2021-05-24 08:11:45 +01:00
..
environments	Add kubeflow cluster environment (#7300 )	2021-05-17 09:05:24 +01:00
__init__.py	…
test_amp_plugins.py	[bugfix] Apex never instantiated. (#7274 )	2021-04-30 13:16:28 -04:00
test_cluster_integration.py	Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin (#7026 )	2021-05-08 11:25:51 +00:00
test_custom_plugin.py	Add typings for evaluation_loop.py and remove some dead code (#7015 )	2021-04-15 07:36:04 +00:00
test_ddp_fully_sharded_with_full_state_dict.py	FSDP with full state dict (#7487 )	2021-05-24 08:11:45 +01:00
test_ddp_plugin.py	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
test_ddp_plugin_with_comm_hook.py	`TrainerState` refactor [5/5] (#7173 )	2021-05-04 12:50:56 +02:00
test_ddp_spawn_plugin.py	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
test_deepspeed_plugin.py	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
test_double_plugin.py	[bugfix] Add set_default_tensor_type to torch.DoubleTensor with precision=64 (#7108 )	2021-04-20 15:25:37 +00:00
test_plugins_registry.py	Add ddp_find_unused_parameters_false to Registry (#7224 )	2021-05-04 22:40:00 +00:00
test_rpc_plugin.py	Clean up environment access in plugins (#6941 )	2021-04-13 20:07:40 +02:00
test_rpc_sequential_plugin.py	Remove legacy support for the magic `log`/`progress_bar` keys in dict returns (#6734 )	2021-03-31 00:28:04 +02:00
test_sharded_plugin.py	Fix ShardedDataParallel has no attribute require_backward_grad_sync (#6915 )	2021-04-10 16:14:37 +00:00
test_single_device_plugin.py	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
test_tpu_spawn.py	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00