Deprecate the `HorovodStrategy` (#16141)
This commit is contained in:
parent
d0b620fe5a
commit
bf8e568845
|
@ -20,7 +20,7 @@ Let's say you have a batch size of 7 in your dataloader.
|
|||
def train_dataloader(self):
|
||||
return Dataset(..., batch_size=7)
|
||||
|
||||
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.
|
||||
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED your effective batch size will be 7 * devices * num_nodes.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -28,13 +28,11 @@ In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size
|
|||
Trainer(accelerator="gpu", devices=8, strategy="ddp")
|
||||
Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
|
||||
Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
|
||||
Trainer(accelerator="gpu", devices=8, strategy="horovod")
|
||||
|
||||
# effective batch size = 7 * 8 * 10
|
||||
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
|
||||
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
|
||||
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
|
||||
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")
|
||||
|
||||
|
||||
.. note:: Huge batch sizes are actually really bad for convergence. Check out:
|
||||
|
|
|
@ -25,7 +25,6 @@ Lightning supports multiple ways of doing distributed training.
|
|||
- Regular (``strategy='ddp'``)
|
||||
- Spawn (``strategy='ddp_spawn'``)
|
||||
- Notebook/Fork (``strategy='ddp_notebook'``)
|
||||
- Horovod (``strategy='horovod'``) (multi-machine, multi-gpu, configured at runtime)
|
||||
- Bagua (``strategy='bagua'``) (multiple-gpus across many machines with advanced training algorithms)
|
||||
|
||||
.. note::
|
||||
|
@ -236,44 +235,6 @@ Comparison of DDP variants and tradeoffs
|
|||
- Fast
|
||||
|
||||
|
||||
Horovod
|
||||
^^^^^^^
|
||||
`Horovod <http://horovod.ai>`_ allows the same training script to be used for single-GPU,
|
||||
multi-GPU, and multi-node training.
|
||||
|
||||
Like Distributed Data Parallel, every process in Horovod operates on a single GPU with a fixed
|
||||
subset of the data. Gradients are averaged across all GPUs in parallel during the backward pass,
|
||||
then synchronously applied before beginning the next step.
|
||||
|
||||
The number of worker processes is configured by a driver application (`horovodrun` or `mpirun`). In
|
||||
the training script, Horovod will detect the number of workers from the environment, and automatically
|
||||
scale the learning rate to compensate for the increased total batch size.
|
||||
|
||||
Horovod can be configured in the training script to run with any number of GPUs / processes as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# train Horovod on GPU (number of GPUs / machines provided on command-line)
|
||||
trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)
|
||||
|
||||
# train Horovod on CPU (number of processes / machines provided on command-line)
|
||||
trainer = Trainer(strategy="horovod")
|
||||
|
||||
When starting the training job, the driver application will then be used to specify the total
|
||||
number of worker processes:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# run training with 4 GPUs on a single machine
|
||||
horovodrun -np 4 python train.py
|
||||
|
||||
# run training with 8 GPUs on two machines (4 GPUs each)
|
||||
horovodrun -np 8 -H hostname1:4,hostname2:4 python train.py
|
||||
|
||||
See the official `Horovod documentation <https://horovod.readthedocs.io/en/stable>`_ for details
|
||||
on installation and performance tuning.
|
||||
|
||||
|
||||
Bagua
|
||||
^^^^^
|
||||
`Bagua <https://github.com/BaguaSys/bagua>`_ is a deep learning training acceleration framework which supports
|
||||
|
@ -284,7 +245,7 @@ multiple advanced distributed training algorithms including:
|
|||
- `ByteGrad <https://tutorials.baguasys.com/algorithms/bytegrad>`_ and `QAdam <https://tutorials.baguasys.com/algorithms/q-adam>`_ for low precision communication, where data is compressed into low precision before communication.
|
||||
- `Asynchronous Model Average <https://tutorials.baguasys.com/algorithms/async-model-average>`_ for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
|
||||
|
||||
By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in Distributed Data Parallel and Horovod,
|
||||
By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in DDP,
|
||||
but Bagua can usually produce a higher training throughput due to its backend written in Rust.
|
||||
|
||||
.. code-block:: python
|
||||
|
|
|
@ -295,7 +295,6 @@ strategies
|
|||
DataParallelStrategy
|
||||
DeepSpeedStrategy
|
||||
HivemindStrategy
|
||||
HorovodStrategy
|
||||
HPUParallelStrategy
|
||||
IPUStrategy
|
||||
ParallelStrategy
|
||||
|
|
|
@ -424,7 +424,6 @@ deterministic
|
|||
|
||||
This flag sets the ``torch.backends.cudnn.deterministic`` flag.
|
||||
Might make your system slower, but ensures reproducibility.
|
||||
Also sets ``$HOROVOD_FUSION_THRESHOLD=0``.
|
||||
|
||||
For more info check `PyTorch docs <https://pytorch.org/docs/stable/notes/randomness.html>`_.
|
||||
|
||||
|
|
|
@ -102,9 +102,6 @@ The below table lists all relevant strategies available in Lightning with their
|
|||
* - deepspeed
|
||||
- :class:`~pytorch_lightning.strategies.DeepSpeedStrategy`
|
||||
- Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. :ref:`Learn more. <advanced/model_parallel:deepspeed>`
|
||||
* - horovod
|
||||
- :class:`~pytorch_lightning.strategies.HorovodStrategy`
|
||||
- Strategy for Horovod distributed training integration. :ref:`Learn more. <accelerators/gpu_intermediate:Horovod>`
|
||||
* - hpu_parallel
|
||||
- :class:`~pytorch_lightning.strategies.HPUParallelStrategy`
|
||||
- Strategy for distributed training on multiple HPU devices. :doc:`Learn more. <../accelerators/hpu>`
|
||||
|
|
|
@ -276,7 +276,7 @@ Additionally, you can pass in your custom strategy by configuring additional par
|
|||
lite = Lite(strategy=DeepSpeedStrategy(stage=2), accelerator="gpu", devices=2)
|
||||
|
||||
|
||||
Support for Horovod and Fully Sharded training strategies are coming soon.
|
||||
Support for Fully Sharded training strategies are coming soon.
|
||||
|
||||
|
||||
devices
|
||||
|
|
|
@ -86,6 +86,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
|
|||
* Deprecates the `pytorch_lightning.utilities.enum.sAMPType` enum
|
||||
* Deprecates the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
|
||||
|
||||
- `horovod` deprecation ([#16141](https://github.com/PyTorchLightning/pytorch-lightning/pull/16141))
|
||||
* Deprecated `Trainer(strategy="horovod")`
|
||||
* Deprecated the `HorovodStrategy` class
|
||||
|
||||
|
||||
### Removed
|
||||
|
||||
|
|
|
@ -16,6 +16,7 @@ from typing import Any, Dict, List, Optional, Tuple, Union
|
|||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from lightning_utilities.core.imports import module_available
|
||||
from torch import Tensor
|
||||
from torch.optim import Optimizer
|
||||
|
||||
|
@ -29,9 +30,9 @@ from pytorch_lightning.plugins.precision import PrecisionPlugin
|
|||
from pytorch_lightning.strategies.parallel import ParallelStrategy
|
||||
from pytorch_lightning.strategies.strategy import TBroadcast
|
||||
from pytorch_lightning.utilities.exceptions import MisconfigurationException
|
||||
from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE
|
||||
from pytorch_lightning.utilities.rank_zero import rank_zero_only
|
||||
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_only
|
||||
|
||||
_HOROVOD_AVAILABLE = module_available("horovod.torch")
|
||||
if _HOROVOD_AVAILABLE:
|
||||
import horovod.torch as hvd
|
||||
|
||||
|
@ -48,6 +49,15 @@ class HorovodStrategy(ParallelStrategy):
|
|||
checkpoint_io: Optional[CheckpointIO] = None,
|
||||
precision_plugin: Optional[PrecisionPlugin] = None,
|
||||
):
|
||||
rank_zero_deprecation(
|
||||
"`The `HorovodStrategy`: `Trainer(strategy='horovod')` has been deprecated in v1.9.0 and will be removed"
|
||||
" in v1.10.0. You can try using the `Trainer(strategy='ddp')` instead."
|
||||
)
|
||||
if not _HOROVOD_AVAILABLE:
|
||||
raise MisconfigurationException(
|
||||
'Requested `strategy="horovod"`, but Horovod is not installed.'
|
||||
" Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
|
||||
)
|
||||
super().__init__(
|
||||
accelerator=accelerator,
|
||||
parallel_devices=parallel_devices,
|
||||
|
|
|
@ -78,9 +78,10 @@ from pytorch_lightning.strategies import (
|
|||
TPUSpawnStrategy,
|
||||
)
|
||||
from pytorch_lightning.strategies.ddp_spawn import _DDP_FORK_ALIASES
|
||||
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
|
||||
from pytorch_lightning.tuner.auto_gpu_select import pick_multiple_gpus
|
||||
from pytorch_lightning.utilities.exceptions import MisconfigurationException
|
||||
from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE, _IPU_AVAILABLE
|
||||
from pytorch_lightning.utilities.imports import _IPU_AVAILABLE
|
||||
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_info, rank_zero_warn
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
@ -653,7 +654,7 @@ class AcceleratorConnector:
|
|||
if not _HOROVOD_AVAILABLE:
|
||||
raise MisconfigurationException(
|
||||
'Requested `strategy="horovod"`, but Horovod is not installed.'
|
||||
"Install with \n $HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]"
|
||||
" Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
|
||||
)
|
||||
|
||||
hvd.init()
|
||||
|
|
|
@ -23,7 +23,6 @@ from pytorch_lightning.utilities.enums import GradClipAlgorithmType # noqa: F40
|
|||
from pytorch_lightning.utilities.grads import grad_norm # noqa: F401
|
||||
from pytorch_lightning.utilities.imports import ( # noqa: F401
|
||||
_HIVEMIND_AVAILABLE,
|
||||
_HOROVOD_AVAILABLE,
|
||||
_HPU_AVAILABLE,
|
||||
_IPU_AVAILABLE,
|
||||
_OMEGACONF_AVAILABLE,
|
||||
|
|
|
@ -27,7 +27,6 @@ _TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0")
|
|||
_DALI_AVAILABLE = module_available("nvidia.dali")
|
||||
_HABANA_FRAMEWORK_AVAILABLE = package_available("habana_frameworks")
|
||||
_HIVEMIND_AVAILABLE = package_available("hivemind")
|
||||
_HOROVOD_AVAILABLE = module_available("horovod.torch")
|
||||
_KINETO_AVAILABLE = torch.profiler.kineto_available()
|
||||
_OMEGACONF_AVAILABLE = package_available("omegaconf")
|
||||
_POPTORCH_AVAILABLE = package_available("poptorch")
|
||||
|
|
|
@ -57,7 +57,6 @@ To test models that require GPU make sure to run the above command on a GPU mach
|
|||
The GPU machine must have at least 2 GPUs to run distributed tests.
|
||||
|
||||
Note that this setup will not run tests that require specific packages installed
|
||||
such as Horovod, FairScale, NVIDIA/apex, NVIDIA/DALI, etc.
|
||||
You can rely on our CI to make sure all these tests pass.
|
||||
|
||||
### Standalone Tests
|
||||
|
@ -72,7 +71,7 @@ There are certain standalone tests, which you can run using:
|
|||
|
||||
## Running Coverage
|
||||
|
||||
Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.
|
||||
Make sure to run coverage on a GPU machine with at least 2 GPUs.
|
||||
|
||||
```bash
|
||||
cd pytorch-lightning
|
||||
|
|
|
@ -51,7 +51,6 @@ def restore_env_variables():
|
|||
"MASTER_PORT",
|
||||
"PL_GLOBAL_SEED",
|
||||
"PL_SEED_WORKERS",
|
||||
"HOROVOD_FUSION_THRESHOLD",
|
||||
"RANK", # set by DeepSpeed
|
||||
"POPLAR_ENGINE_OPTIONS", # set by IPUStrategy
|
||||
"CUDA_MODULE_LOADING", # leaked since PyTorch 1.13
|
||||
|
|
|
@ -69,7 +69,7 @@ def restore_env_variables():
|
|||
"WANDB_MODE",
|
||||
"WANDB_REQUIRE_SERVICE",
|
||||
"WANDB_SERVICE",
|
||||
"HOROVOD_FUSION_THRESHOLD",
|
||||
"HOROVOD_FUSION_THRESHOLD", # set by HorovodStrategy # TODO: remove in v1.10.0
|
||||
"RANK", # set by DeepSpeed
|
||||
"POPLAR_ENGINE_OPTIONS", # set by IPUStrategy
|
||||
"CUDA_MODULE_LOADING", # leaked since PyTorch 1.13
|
||||
|
|
|
@ -403,3 +403,9 @@ def test_apex_deprecation_warnings():
|
|||
trainer = Trainer()
|
||||
with pytest.deprecated_call(match="amp_backend` will not be supported"):
|
||||
trainer.amp_backend
|
||||
|
||||
|
||||
@RunIf(horovod=True)
|
||||
def test_horovod_deprecation_warnings(*_):
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
Trainer(strategy="horovod")
|
||||
|
|
|
@ -29,9 +29,9 @@ from pytorch_lightning.plugins.precision.apex_amp import _APEX_AVAILABLE
|
|||
from pytorch_lightning.strategies.bagua import _BAGUA_AVAILABLE
|
||||
from pytorch_lightning.strategies.colossalai import _COLOSSALAI_AVAILABLE
|
||||
from pytorch_lightning.strategies.deepspeed import _DEEPSPEED_AVAILABLE
|
||||
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
|
||||
from pytorch_lightning.utilities.imports import (
|
||||
_HIVEMIND_AVAILABLE,
|
||||
_HOROVOD_AVAILABLE,
|
||||
_HPU_AVAILABLE,
|
||||
_IPU_AVAILABLE,
|
||||
_OMEGACONF_AVAILABLE,
|
||||
|
@ -42,12 +42,12 @@ from tests_pytorch.helpers.datamodules import _SKLEARN_AVAILABLE
|
|||
|
||||
_HOROVOD_NCCL_AVAILABLE = False
|
||||
if _HOROVOD_AVAILABLE:
|
||||
import horovod
|
||||
import horovod.torch as hvd
|
||||
|
||||
try:
|
||||
|
||||
# `nccl_built` returns an integer
|
||||
_HOROVOD_NCCL_AVAILABLE = bool(horovod.torch.nccl_built())
|
||||
_HOROVOD_NCCL_AVAILABLE = bool(hvd.nccl_built())
|
||||
except AttributeError:
|
||||
# AttributeError can be raised if MPI is not available:
|
||||
# https://github.com/horovod/horovod/blob/v0.23.0/horovod/torch/__init__.py#L33-L34
|
||||
|
@ -77,8 +77,8 @@ class RunIf:
|
|||
ipu: bool = False,
|
||||
hpu: bool = False,
|
||||
mps: Optional[bool] = None,
|
||||
horovod: bool = False,
|
||||
horovod_nccl: bool = False,
|
||||
horovod: bool = False, # TODO: remove in v1.10.0
|
||||
horovod_nccl: bool = False, # TODO: remove in v1.10.0
|
||||
skip_windows: bool = False,
|
||||
standalone: bool = False,
|
||||
fairscale: bool = False,
|
||||
|
|
|
@ -29,7 +29,7 @@ if ":" in PYTHONPATH:
|
|||
|
||||
from pytorch_lightning import Trainer # noqa: E402
|
||||
from pytorch_lightning.callbacks import ModelCheckpoint # noqa: E402
|
||||
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE # noqa: E402
|
||||
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE # noqa: E402
|
||||
|
||||
if _HOROVOD_AVAILABLE:
|
||||
import horovod.torch as hvd
|
||||
|
|
|
@ -28,7 +28,7 @@ import tests_pytorch.helpers.pipelines as tpipes
|
|||
from pytorch_lightning import Trainer
|
||||
from pytorch_lightning.accelerators import CPUAccelerator
|
||||
from pytorch_lightning.demos.boring_classes import BoringModel
|
||||
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE
|
||||
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
|
||||
from pytorch_lightning.utilities.exceptions import MisconfigurationException
|
||||
from tests_pytorch.helpers.advanced_models import BasicGAN
|
||||
from tests_pytorch.helpers.runif import RunIf
|
||||
|
@ -165,19 +165,20 @@ def test_horovod_multi_gpu_accumulate_grad_batches(tmpdir):
|
|||
_run_horovod(trainer_options)
|
||||
|
||||
|
||||
@RunIf(horovod=True, skip_windows=True, min_cuda_gpus=2)
|
||||
@RunIf(horovod=True, skip_windows=True, min_cuda_gpus=1)
|
||||
def test_horovod_raises_unsupported_accumulate_grad_batches(tmpdir):
|
||||
"""Ensure MisConfigurationException for different `accumulate_grad_batches` at different epochs for Horovod
|
||||
Strategy on multi-gpus."""
|
||||
model = BoringModel()
|
||||
trainer = Trainer(
|
||||
default_root_dir=tmpdir,
|
||||
enable_progress_bar=False,
|
||||
accumulate_grad_batches={0: 4, 2: 2},
|
||||
accelerator="auto",
|
||||
devices=2,
|
||||
strategy="horovod",
|
||||
)
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
trainer = Trainer(
|
||||
default_root_dir=tmpdir,
|
||||
enable_progress_bar=False,
|
||||
accumulate_grad_batches={0: 4, 2: 2},
|
||||
accelerator="auto",
|
||||
devices=1,
|
||||
strategy="horovod",
|
||||
)
|
||||
with pytest.raises(MisconfigurationException, match="Horovod.*does not support.*accumulate_grad_batches"):
|
||||
trainer.fit(model)
|
||||
|
||||
|
@ -259,7 +260,8 @@ def test_horovod_transfer_batch_to_gpu(tmpdir):
|
|||
devices=2,
|
||||
strategy="horovod",
|
||||
)
|
||||
tpipes.run_model_test_without_loggers(trainer_options, model)
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
tpipes.run_model_test_without_loggers(trainer_options, model)
|
||||
|
||||
|
||||
@RunIf(horovod=True, skip_windows=True)
|
||||
|
@ -267,14 +269,15 @@ def test_horovod_multi_optimizer(tmpdir):
|
|||
model = BasicGAN()
|
||||
|
||||
# fit model
|
||||
trainer = Trainer(
|
||||
default_root_dir=str(tmpdir),
|
||||
enable_progress_bar=False,
|
||||
max_epochs=1,
|
||||
limit_train_batches=0.4,
|
||||
limit_val_batches=0.2,
|
||||
strategy="horovod",
|
||||
)
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
trainer = Trainer(
|
||||
default_root_dir=str(tmpdir),
|
||||
enable_progress_bar=False,
|
||||
max_epochs=1,
|
||||
limit_train_batches=0.4,
|
||||
limit_val_batches=0.2,
|
||||
strategy="horovod",
|
||||
)
|
||||
trainer.fit(model)
|
||||
assert trainer.state.finished, f"Training failed with {trainer.state}"
|
||||
|
||||
|
@ -293,7 +296,6 @@ def test_horovod_multi_optimizer(tmpdir):
|
|||
assert get_model_params(model.discriminator) == get_optimizer_params(trainer.optimizers[1])
|
||||
|
||||
|
||||
# todo: need to be fixed :]
|
||||
@pytest.mark.skip(reason="TODO: CI agent.jobstatus=Succeeded: Permission denied")
|
||||
@RunIf(horovod=True, skip_windows=True)
|
||||
def test_result_reduce_horovod(tmpdir):
|
||||
|
@ -327,15 +329,16 @@ def test_result_reduce_horovod(tmpdir):
|
|||
model = TestModel()
|
||||
model.val_dataloader = None
|
||||
|
||||
trainer = Trainer(
|
||||
default_root_dir=tmpdir,
|
||||
limit_train_batches=2,
|
||||
limit_val_batches=2,
|
||||
max_epochs=1,
|
||||
log_every_n_steps=1,
|
||||
enable_model_summary=False,
|
||||
logger=False,
|
||||
)
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
trainer = Trainer(
|
||||
default_root_dir=tmpdir,
|
||||
limit_train_batches=2,
|
||||
limit_val_batches=2,
|
||||
max_epochs=1,
|
||||
log_every_n_steps=1,
|
||||
enable_model_summary=False,
|
||||
logger=False,
|
||||
)
|
||||
|
||||
trainer.fit(model)
|
||||
|
||||
|
@ -361,7 +364,8 @@ def test_accuracy_metric_horovod():
|
|||
target = torch.randint(high=2, size=(num_batches, batch_size))
|
||||
|
||||
def _compute_batch():
|
||||
trainer = Trainer(fast_dev_run=True, strategy="horovod", logger=False)
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
trainer = Trainer(fast_dev_run=True, strategy="horovod", logger=False)
|
||||
|
||||
assert isinstance(trainer.accelerator, CPUAccelerator)
|
||||
# TODO: test that we selected the correct strategy based on horovod flags
|
||||
|
@ -414,11 +418,14 @@ def test_horovod_multi_optimizer_with_scheduling_stepping(tmpdir):
|
|||
init_lr = 0.1 * num_workers
|
||||
|
||||
with patch("horovod.torch.size", return_value=8):
|
||||
|
||||
# fit model
|
||||
trainer = Trainer(
|
||||
default_root_dir=tmpdir, max_epochs=1, limit_val_batches=0.5, limit_train_batches=0.2, strategy="horovod"
|
||||
)
|
||||
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
|
||||
trainer = Trainer(
|
||||
default_root_dir=tmpdir,
|
||||
max_epochs=1,
|
||||
limit_val_batches=0.5,
|
||||
limit_train_batches=0.2,
|
||||
strategy="horovod",
|
||||
)
|
||||
trainer.fit(model)
|
||||
|
||||
adjusted_lr1 = [pg["lr"] for pg in trainer.optimizers[0].param_groups][0]
|
||||
|
|
|
@ -25,7 +25,8 @@ from torch.distributed import is_available
|
|||
|
||||
from pytorch_lightning.plugins.precision.apex_amp import _APEX_AVAILABLE
|
||||
from pytorch_lightning.strategies.bagua import _BAGUA_AVAILABLE
|
||||
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE, _OMEGACONF_AVAILABLE, _POPTORCH_AVAILABLE
|
||||
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
|
||||
from pytorch_lightning.utilities import _OMEGACONF_AVAILABLE, _POPTORCH_AVAILABLE
|
||||
from tests_pytorch.helpers.runif import RunIf
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue