* Skip test due to 'Python bus error'
* Debug NCCL
* Remove NCCL_DEBUG statement
* Revert "Skip test due to 'Python bus error'"
This reverts commit e0a3e8785d.
* fix
* add test
* changelog
* yapf
* patch os environ
* make a special test
* destroy pg
* debug
* revert
* revert
* problematic test
* skip
* try the fixture
* test
* update sensitive test
* update changelog
* remove comment
* update wrong test
* update test name
* parameterization
* Revert "parameterization"
This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.
* remove conftest
* ignore test
* teardown
* fix merge
* deep speed parameterization
* uncomment test
* update chlog
* update changelog
* split tests
* update test
update test
update test
update test
* update test comments
* unroll test
* unroll test
* unroll test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* increase shm
* sudo
* unroll ipu
* Revert "sudo"
This reverts commit 6cc68c1478.
* Revert "increase shm"
This reverts commit 8c27163483.
* x
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* find guilty test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* POPTORCH_WAIT_FOR_IPU=1
* move test
* redo parameterize for ipu
* de-comment test
* move chlog
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Move connection setup into the setup function. Call setup hook after we set up the accelerator
* Added CHANGELOG.md
* fix setup order in callback test
* fix input arguments in test
* Mock distributed function, remove protection to turn into training type hook
* Remove import
* Add missing mock, ensure custom plugin does not create children process
* Skip test on windows
* Update deepspeed to init connection in setup
* Do not initialize distributed module
* Move DeepSpeed tests to special tests since dist communication is being set up
* Special the test to see if this fixes CI
* Delete accelerator connector test to see if its causing build to fail
* Delete deepspeed test
* Revert "Delete accelerator connector test to see if its causing build to fail"
This reverts commit edde60b8
* Revert "Delete deepspeed test"
This reverts commit 9d317429
* Reverse hook
* Reverse setup hooks to debug again
* Add todo so i know where i left off
* For single device move in pre_dispatch after setup function
* Add additional model to device hook if any additional parameters have been set
* See if we can enable deepspeed tests
* Revert "See if we can enable deepspeed tests"
This reverts commit b5450def
* See if this hook approach works
* Introduce new granular hooks
* Remove import, fix tpu spawn by moving the function to setup
* Added missing special test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>