Deprecate the `HorovodStrategy` (#16141)

This commit is contained in:
Carlos Mocholí 2022-12-20 18:38:28 +01:00 committed by GitHub
parent d0b620fe5a
commit bf8e568845
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
19 changed files with 80 additions and 101 deletions

View File

@ -20,7 +20,7 @@ Let's say you have a batch size of 7 in your dataloader.
def train_dataloader(self):
return Dataset(..., batch_size=7)
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED your effective batch size will be 7 * devices * num_nodes.
.. code-block:: python
@ -28,13 +28,11 @@ In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size
Trainer(accelerator="gpu", devices=8, strategy="ddp")
Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, strategy="horovod")
# effective batch size = 7 * 8 * 10
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")
.. note:: Huge batch sizes are actually really bad for convergence. Check out:

View File

@ -25,7 +25,6 @@ Lightning supports multiple ways of doing distributed training.
- Regular (``strategy='ddp'``)
- Spawn (``strategy='ddp_spawn'``)
- Notebook/Fork (``strategy='ddp_notebook'``)
- Horovod (``strategy='horovod'``) (multi-machine, multi-gpu, configured at runtime)
- Bagua (``strategy='bagua'``) (multiple-gpus across many machines with advanced training algorithms)
.. note::
@ -236,44 +235,6 @@ Comparison of DDP variants and tradeoffs
- Fast
Horovod
^^^^^^^
`Horovod <http://horovod.ai>`_ allows the same training script to be used for single-GPU,
multi-GPU, and multi-node training.
Like Distributed Data Parallel, every process in Horovod operates on a single GPU with a fixed
subset of the data. Gradients are averaged across all GPUs in parallel during the backward pass,
then synchronously applied before beginning the next step.
The number of worker processes is configured by a driver application (`horovodrun` or `mpirun`). In
the training script, Horovod will detect the number of workers from the environment, and automatically
scale the learning rate to compensate for the increased total batch size.
Horovod can be configured in the training script to run with any number of GPUs / processes as follows:
.. code-block:: python
# train Horovod on GPU (number of GPUs / machines provided on command-line)
trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)
# train Horovod on CPU (number of processes / machines provided on command-line)
trainer = Trainer(strategy="horovod")
When starting the training job, the driver application will then be used to specify the total
number of worker processes:
.. code-block:: bash
# run training with 4 GPUs on a single machine
horovodrun -np 4 python train.py
# run training with 8 GPUs on two machines (4 GPUs each)
horovodrun -np 8 -H hostname1:4,hostname2:4 python train.py
See the official `Horovod documentation <https://horovod.readthedocs.io/en/stable>`_ for details
on installation and performance tuning.
Bagua
^^^^^
`Bagua <https://github.com/BaguaSys/bagua>`_ is a deep learning training acceleration framework which supports
@ -284,7 +245,7 @@ multiple advanced distributed training algorithms including:
- `ByteGrad <https://tutorials.baguasys.com/algorithms/bytegrad>`_ and `QAdam <https://tutorials.baguasys.com/algorithms/q-adam>`_ for low precision communication, where data is compressed into low precision before communication.
- `Asynchronous Model Average <https://tutorials.baguasys.com/algorithms/async-model-average>`_ for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in Distributed Data Parallel and Horovod,
By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in DDP,
but Bagua can usually produce a higher training throughput due to its backend written in Rust.
.. code-block:: python

View File

@ -295,7 +295,6 @@ strategies
DataParallelStrategy
DeepSpeedStrategy
HivemindStrategy
HorovodStrategy
HPUParallelStrategy
IPUStrategy
ParallelStrategy

View File

@ -424,7 +424,6 @@ deterministic
This flag sets the ``torch.backends.cudnn.deterministic`` flag.
Might make your system slower, but ensures reproducibility.
Also sets ``$HOROVOD_FUSION_THRESHOLD=0``.
For more info check `PyTorch docs <https://pytorch.org/docs/stable/notes/randomness.html>`_.

View File

@ -102,9 +102,6 @@ The below table lists all relevant strategies available in Lightning with their
* - deepspeed
- :class:`~pytorch_lightning.strategies.DeepSpeedStrategy`
- Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. :ref:`Learn more. <advanced/model_parallel:deepspeed>`
* - horovod
- :class:`~pytorch_lightning.strategies.HorovodStrategy`
- Strategy for Horovod distributed training integration. :ref:`Learn more. <accelerators/gpu_intermediate:Horovod>`
* - hpu_parallel
- :class:`~pytorch_lightning.strategies.HPUParallelStrategy`
- Strategy for distributed training on multiple HPU devices. :doc:`Learn more. <../accelerators/hpu>`

View File

@ -276,7 +276,7 @@ Additionally, you can pass in your custom strategy by configuring additional par
lite = Lite(strategy=DeepSpeedStrategy(stage=2), accelerator="gpu", devices=2)
Support for Horovod and Fully Sharded training strategies are coming soon.
Support for Fully Sharded training strategies are coming soon.
devices

View File

@ -86,6 +86,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
* Deprecates the `pytorch_lightning.utilities.enum.sAMPType` enum
* Deprecates the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
- `horovod` deprecation ([#16141](https://github.com/PyTorchLightning/pytorch-lightning/pull/16141))
* Deprecated `Trainer(strategy="horovod")`
* Deprecated the `HorovodStrategy` class
### Removed

View File

@ -16,6 +16,7 @@ from typing import Any, Dict, List, Optional, Tuple, Union
import torch
import torch.nn as nn
from lightning_utilities.core.imports import module_available
from torch import Tensor
from torch.optim import Optimizer
@ -29,9 +30,9 @@ from pytorch_lightning.plugins.precision import PrecisionPlugin
from pytorch_lightning.strategies.parallel import ParallelStrategy
from pytorch_lightning.strategies.strategy import TBroadcast
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE
from pytorch_lightning.utilities.rank_zero import rank_zero_only
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_only
_HOROVOD_AVAILABLE = module_available("horovod.torch")
if _HOROVOD_AVAILABLE:
import horovod.torch as hvd
@ -48,6 +49,15 @@ class HorovodStrategy(ParallelStrategy):
checkpoint_io: Optional[CheckpointIO] = None,
precision_plugin: Optional[PrecisionPlugin] = None,
):
rank_zero_deprecation(
"`The `HorovodStrategy`: `Trainer(strategy='horovod')` has been deprecated in v1.9.0 and will be removed"
" in v1.10.0. You can try using the `Trainer(strategy='ddp')` instead."
)
if not _HOROVOD_AVAILABLE:
raise MisconfigurationException(
'Requested `strategy="horovod"`, but Horovod is not installed.'
" Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
)
super().__init__(
accelerator=accelerator,
parallel_devices=parallel_devices,

View File

@ -78,9 +78,10 @@ from pytorch_lightning.strategies import (
TPUSpawnStrategy,
)
from pytorch_lightning.strategies.ddp_spawn import _DDP_FORK_ALIASES
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
from pytorch_lightning.tuner.auto_gpu_select import pick_multiple_gpus
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE, _IPU_AVAILABLE
from pytorch_lightning.utilities.imports import _IPU_AVAILABLE
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_info, rank_zero_warn
log = logging.getLogger(__name__)
@ -653,7 +654,7 @@ class AcceleratorConnector:
if not _HOROVOD_AVAILABLE:
raise MisconfigurationException(
'Requested `strategy="horovod"`, but Horovod is not installed.'
"Install with \n $HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]"
" Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
)
hvd.init()

View File

@ -23,7 +23,6 @@ from pytorch_lightning.utilities.enums import GradClipAlgorithmType # noqa: F40
from pytorch_lightning.utilities.grads import grad_norm # noqa: F401
from pytorch_lightning.utilities.imports import ( # noqa: F401
_HIVEMIND_AVAILABLE,
_HOROVOD_AVAILABLE,
_HPU_AVAILABLE,
_IPU_AVAILABLE,
_OMEGACONF_AVAILABLE,

View File

@ -27,7 +27,6 @@ _TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0")
_DALI_AVAILABLE = module_available("nvidia.dali")
_HABANA_FRAMEWORK_AVAILABLE = package_available("habana_frameworks")
_HIVEMIND_AVAILABLE = package_available("hivemind")
_HOROVOD_AVAILABLE = module_available("horovod.torch")
_KINETO_AVAILABLE = torch.profiler.kineto_available()
_OMEGACONF_AVAILABLE = package_available("omegaconf")
_POPTORCH_AVAILABLE = package_available("poptorch")

View File

@ -57,7 +57,6 @@ To test models that require GPU make sure to run the above command on a GPU mach
The GPU machine must have at least 2 GPUs to run distributed tests.
Note that this setup will not run tests that require specific packages installed
such as Horovod, FairScale, NVIDIA/apex, NVIDIA/DALI, etc.
You can rely on our CI to make sure all these tests pass.
### Standalone Tests
@ -72,7 +71,7 @@ There are certain standalone tests, which you can run using:
## Running Coverage
Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.
Make sure to run coverage on a GPU machine with at least 2 GPUs.
```bash
cd pytorch-lightning

View File

@ -51,7 +51,6 @@ def restore_env_variables():
"MASTER_PORT",
"PL_GLOBAL_SEED",
"PL_SEED_WORKERS",
"HOROVOD_FUSION_THRESHOLD",
"RANK", # set by DeepSpeed
"POPLAR_ENGINE_OPTIONS", # set by IPUStrategy
"CUDA_MODULE_LOADING", # leaked since PyTorch 1.13

View File

@ -69,7 +69,7 @@ def restore_env_variables():
"WANDB_MODE",
"WANDB_REQUIRE_SERVICE",
"WANDB_SERVICE",
"HOROVOD_FUSION_THRESHOLD",
"HOROVOD_FUSION_THRESHOLD", # set by HorovodStrategy # TODO: remove in v1.10.0
"RANK", # set by DeepSpeed
"POPLAR_ENGINE_OPTIONS", # set by IPUStrategy
"CUDA_MODULE_LOADING", # leaked since PyTorch 1.13

View File

@ -403,3 +403,9 @@ def test_apex_deprecation_warnings():
trainer = Trainer()
with pytest.deprecated_call(match="amp_backend` will not be supported"):
trainer.amp_backend
@RunIf(horovod=True)
def test_horovod_deprecation_warnings(*_):
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
Trainer(strategy="horovod")

View File

@ -29,9 +29,9 @@ from pytorch_lightning.plugins.precision.apex_amp import _APEX_AVAILABLE
from pytorch_lightning.strategies.bagua import _BAGUA_AVAILABLE
from pytorch_lightning.strategies.colossalai import _COLOSSALAI_AVAILABLE
from pytorch_lightning.strategies.deepspeed import _DEEPSPEED_AVAILABLE
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
from pytorch_lightning.utilities.imports import (
_HIVEMIND_AVAILABLE,
_HOROVOD_AVAILABLE,
_HPU_AVAILABLE,
_IPU_AVAILABLE,
_OMEGACONF_AVAILABLE,
@ -42,12 +42,12 @@ from tests_pytorch.helpers.datamodules import _SKLEARN_AVAILABLE
_HOROVOD_NCCL_AVAILABLE = False
if _HOROVOD_AVAILABLE:
import horovod
import horovod.torch as hvd
try:
# `nccl_built` returns an integer
_HOROVOD_NCCL_AVAILABLE = bool(horovod.torch.nccl_built())
_HOROVOD_NCCL_AVAILABLE = bool(hvd.nccl_built())
except AttributeError:
# AttributeError can be raised if MPI is not available:
# https://github.com/horovod/horovod/blob/v0.23.0/horovod/torch/__init__.py#L33-L34
@ -77,8 +77,8 @@ class RunIf:
ipu: bool = False,
hpu: bool = False,
mps: Optional[bool] = None,
horovod: bool = False,
horovod_nccl: bool = False,
horovod: bool = False, # TODO: remove in v1.10.0
horovod_nccl: bool = False, # TODO: remove in v1.10.0
skip_windows: bool = False,
standalone: bool = False,
fairscale: bool = False,

View File

@ -29,7 +29,7 @@ if ":" in PYTHONPATH:
from pytorch_lightning import Trainer # noqa: E402
from pytorch_lightning.callbacks import ModelCheckpoint # noqa: E402
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE # noqa: E402
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE # noqa: E402
if _HOROVOD_AVAILABLE:
import horovod.torch as hvd

View File

@ -28,7 +28,7 @@ import tests_pytorch.helpers.pipelines as tpipes
from pytorch_lightning import Trainer
from pytorch_lightning.accelerators import CPUAccelerator
from pytorch_lightning.demos.boring_classes import BoringModel
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from tests_pytorch.helpers.advanced_models import BasicGAN
from tests_pytorch.helpers.runif import RunIf
@ -165,19 +165,20 @@ def test_horovod_multi_gpu_accumulate_grad_batches(tmpdir):
_run_horovod(trainer_options)
@RunIf(horovod=True, skip_windows=True, min_cuda_gpus=2)
@RunIf(horovod=True, skip_windows=True, min_cuda_gpus=1)
def test_horovod_raises_unsupported_accumulate_grad_batches(tmpdir):
"""Ensure MisConfigurationException for different `accumulate_grad_batches` at different epochs for Horovod
Strategy on multi-gpus."""
model = BoringModel()
trainer = Trainer(
default_root_dir=tmpdir,
enable_progress_bar=False,
accumulate_grad_batches={0: 4, 2: 2},
accelerator="auto",
devices=2,
strategy="horovod",
)
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
trainer = Trainer(
default_root_dir=tmpdir,
enable_progress_bar=False,
accumulate_grad_batches={0: 4, 2: 2},
accelerator="auto",
devices=1,
strategy="horovod",
)
with pytest.raises(MisconfigurationException, match="Horovod.*does not support.*accumulate_grad_batches"):
trainer.fit(model)
@ -259,7 +260,8 @@ def test_horovod_transfer_batch_to_gpu(tmpdir):
devices=2,
strategy="horovod",
)
tpipes.run_model_test_without_loggers(trainer_options, model)
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
tpipes.run_model_test_without_loggers(trainer_options, model)
@RunIf(horovod=True, skip_windows=True)
@ -267,14 +269,15 @@ def test_horovod_multi_optimizer(tmpdir):
model = BasicGAN()
# fit model
trainer = Trainer(
default_root_dir=str(tmpdir),
enable_progress_bar=False,
max_epochs=1,
limit_train_batches=0.4,
limit_val_batches=0.2,
strategy="horovod",
)
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
trainer = Trainer(
default_root_dir=str(tmpdir),
enable_progress_bar=False,
max_epochs=1,
limit_train_batches=0.4,
limit_val_batches=0.2,
strategy="horovod",
)
trainer.fit(model)
assert trainer.state.finished, f"Training failed with {trainer.state}"
@ -293,7 +296,6 @@ def test_horovod_multi_optimizer(tmpdir):
assert get_model_params(model.discriminator) == get_optimizer_params(trainer.optimizers[1])
# todo: need to be fixed :]
@pytest.mark.skip(reason="TODO: CI agent.jobstatus=Succeeded: Permission denied")
@RunIf(horovod=True, skip_windows=True)
def test_result_reduce_horovod(tmpdir):
@ -327,15 +329,16 @@ def test_result_reduce_horovod(tmpdir):
model = TestModel()
model.val_dataloader = None
trainer = Trainer(
default_root_dir=tmpdir,
limit_train_batches=2,
limit_val_batches=2,
max_epochs=1,
log_every_n_steps=1,
enable_model_summary=False,
logger=False,
)
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
trainer = Trainer(
default_root_dir=tmpdir,
limit_train_batches=2,
limit_val_batches=2,
max_epochs=1,
log_every_n_steps=1,
enable_model_summary=False,
logger=False,
)
trainer.fit(model)
@ -361,7 +364,8 @@ def test_accuracy_metric_horovod():
target = torch.randint(high=2, size=(num_batches, batch_size))
def _compute_batch():
trainer = Trainer(fast_dev_run=True, strategy="horovod", logger=False)
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
trainer = Trainer(fast_dev_run=True, strategy="horovod", logger=False)
assert isinstance(trainer.accelerator, CPUAccelerator)
# TODO: test that we selected the correct strategy based on horovod flags
@ -414,11 +418,14 @@ def test_horovod_multi_optimizer_with_scheduling_stepping(tmpdir):
init_lr = 0.1 * num_workers
with patch("horovod.torch.size", return_value=8):
# fit model
trainer = Trainer(
default_root_dir=tmpdir, max_epochs=1, limit_val_batches=0.5, limit_train_batches=0.2, strategy="horovod"
)
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
trainer = Trainer(
default_root_dir=tmpdir,
max_epochs=1,
limit_val_batches=0.5,
limit_train_batches=0.2,
strategy="horovod",
)
trainer.fit(model)
adjusted_lr1 = [pg["lr"] for pg in trainer.optimizers[0].param_groups][0]

View File

@ -25,7 +25,8 @@ from torch.distributed import is_available
from pytorch_lightning.plugins.precision.apex_amp import _APEX_AVAILABLE
from pytorch_lightning.strategies.bagua import _BAGUA_AVAILABLE
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE, _OMEGACONF_AVAILABLE, _POPTORCH_AVAILABLE
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
from pytorch_lightning.utilities import _OMEGACONF_AVAILABLE, _POPTORCH_AVAILABLE
from tests_pytorch.helpers.runif import RunIf