* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789, reversing
changes made to 0d23d75bc9.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6.
* Revert "Update utils.py"
This reverts commit a9aae99f6e.
* Revert "Update test_results.py"
This reverts commit ea74906878.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731.
* modify distributed environment to make test pass
* add DDP communication hook
* remove test related setting
* remove more test related setting
* fix ddp comm hook util import issue
* comments
* one more fix for test_custom_plugin
* fix ddp spwan
* fix sgd
* address comments and add tests
* 1. add is gpu checking 2. modify test a bit 3. formatting
* formatting nit
* fix conda 3.7 1.7 issue for no torch.distributed.algorithms module
* need at least 1.8.0
* minor fix
* modify changelog
* changelog should link to PR number instead of issue number
* refine a bit on doc for register_ddp_comm_hook function, like ddp_comm_wrapper explanation and add hyperparameter for power sgd states in example usge
* move single device checking before call register_ddp_comm_hook
* formatting
* comments
* typo
* pre-commit formatting