Allow Horovod `teardown()` to complete gracefully if exception thrown in callback setup (#11752)

This commit is contained in:
Dan Dale 2022-02-05 11:13:21 -08:00 committed by GitHub
parent 819a747031
commit 9d8faecdb2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 6 additions and 2 deletions

View File

@ -472,6 +472,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
### Fixed ### Fixed
- Fixed an issue where `HorovodStrategy.teardown()` did not complete gracefully if an exception was thrown during callback setup [#11752](https://github.com/PyTorchLightning/pytorch-lightning/pull/11752)
- Fixed security vulnerabilities CVE-2020-1747 and CVE-2020-14343 caused by the `PyYAML` dependency ([#11099](https://github.com/PyTorchLightning/pytorch-lightning/pull/11099)) - Fixed security vulnerabilities CVE-2020-1747 and CVE-2020-14343 caused by the `PyYAML` dependency ([#11099](https://github.com/PyTorchLightning/pytorch-lightning/pull/11099))

View File

@ -197,8 +197,10 @@ class HorovodStrategy(ParallelStrategy):
def teardown(self) -> None: def teardown(self) -> None:
super().teardown() super().teardown()
self._exit_stack.__exit__(None, None, None) # teardown may be called before `_exit_stack` is set
self._exit_stack = None if self._exit_stack:
self._exit_stack.__exit__(None, None, None)
self._exit_stack = None
# Make sure all workers have finished training before returning to the user # Make sure all workers have finished training before returning to the user
self.join() self.join()
if self.root_device.type == "cuda": if self.root_device.type == "cuda":