lightning/docs/source/extensions/plugins.rst

160 lines
4.2 KiB
ReStructuredText
Raw Normal View History

.. _plugins:
#######
Plugins
#######
.. include:: ../links.rst
Plugins allow custom integrations to the internals of the Trainer such as a custom precision or
distributed implementation.
Under the hood, the Lightning Trainer is using plugins in the training routine, added automatically
depending on the provided Trainer arguments. For example:
.. code-block:: python
# accelerator: GPUAccelerator
# training type: DDPPlugin
# precision: NativeMixedPrecisionPlugin
trainer = Trainer(gpus=4, precision=16)
We expose Accelerators and Plugins mainly for expert users that want to extend Lightning for:
- New hardware (like TPU plugin)
- Distributed backends (e.g. a backend not yet supported by
`PyTorch <https://pytorch.org/docs/stable/distributed.html#backends>`_ itself)
- Clusters (e.g. customized access to the cluster's environment interface)
There are two types of Plugins in Lightning with different responsibilities:
TrainingTypePlugin
------------------
- Launching and teardown of training processes (if applicable)
- Setup communication between processes (NCCL, GLOO, MPI, ...)
- Provide a unified communication interface for reduction, broadcast, etc.
- Provide access to the wrapped LightningModule
PrecisionPlugin
---------------
- Perform pre- and post backward/optimizer step operations such as scaling gradients
- Provide context managers for forward, training_step, etc.
- Gradient clipping
Futhermore, for multi-node training Lightning provides cluster environment plugins that allow the advanced user
to configure Lightning to integrate with a :ref:`custom-cluster`.
.. image:: ../_static/images/accelerator/overview.svg
**********************
Create a custom plugin
**********************
Expert users may choose to extend an existing plugin by overriding its methods ...
.. code-block:: python
from pytorch_lightning.plugins import DDPPlugin
class CustomDDPPlugin(DDPPlugin):
def configure_ddp(self):
self._model = MyCustomDistributedDataParallel(
self.model,
device_ids=...,
)
or by subclassing the base classes :class:`~pytorch_lightning.plugins.training_type.TrainingTypePlugin` or
:class:`~pytorch_lightning.plugins.precision.PrecisionPlugin` to create new ones. These custom plugins
can then be passed into the Trainer directly or via a (custom) accelerator:
.. code-block:: python
# custom plugins
trainer = Trainer(strategy=CustomDDPPlugin(), plugins=[CustomPrecisionPlugin()])
# fully custom accelerator and plugins
3/n Move accelerator into Strategy (#11022) * remove training_step() from accelerator * remove test, val, predict step * move * wip * accelerator references * cpu training * rename occurrences in tests * update tests * pull from adrian's commit * fix changelog merge pro * fix accelerator_connector and other updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix doc build and some mypy * fix lite * fix gpu setup environment * support customized ttp and accelerator * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tpu error check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix precision_plugin initialization to recognisze cusomized plugin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update bug_report_model.py * Update accelerator_connector.py * update changelog * allow shorthand typing references to pl.Accelerator * rename helper method and add docstring * fix typing * Update pytorch_lightning/trainer/connectors/accelerator_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update tests/accelerators/test_accelerator_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update tests/accelerators/test_cpu.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pre commit complaint * update typing to long ugly path * spacing in flow diagram * remove todo comments * docformatter * Update pytorch_lightning/plugins/training_type/training_type_plugin.py * revert test changes * improve custom plugin examples * remove redundant call to ttp attribute it is no longer a property * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 04:41:34 +00:00
accelerator = MyAccelerator()
precision_plugin = MyPrecisionPlugin()
training_type_plugin = CustomDDPPlugin(accelerator=accelerator, precision_plugin=precision_plugin)
trainer = Trainer(strategy=training_type_plugin)
The full list of built-in plugins is listed below.
.. warning:: The Plugin API is in beta and subject to change.
For help setting up custom plugins/accelerators, please reach out to us at **support@pytorchlightning.ai**
----------
Training Type Plugins
---------------------
.. currentmodule:: pytorch_lightning.plugins.training_type
.. autosummary::
:nosignatures:
:template: classtemplate.rst
TrainingTypePlugin
SingleDevicePlugin
ParallelPlugin
DataParallelPlugin
DDPPlugin
DDP2Plugin
DDPShardedPlugin
DDPSpawnShardedPlugin
DDPSpawnPlugin
DeepSpeedPlugin
HorovodPlugin
SingleTPUPlugin
TPUSpawnPlugin
Precision Plugins
-----------------
.. currentmodule:: pytorch_lightning.plugins.precision
.. autosummary::
:nosignatures:
:template: classtemplate.rst
PrecisionPlugin
MixedPrecisionPlugin
NativeMixedPrecisionPlugin
ShardedNativeMixedPrecisionPlugin
ApexMixedPrecisionPlugin
DeepSpeedPrecisionPlugin
TPUPrecisionPlugin
TPUBf16PrecisionPlugin
DoublePrecisionPlugin
FullyShardedNativeMixedPrecisionPlugin
IPUPrecisionPlugin
Cluster Environments
--------------------
.. currentmodule:: pytorch_lightning.plugins.environments
.. autosummary::
:nosignatures:
:template: classtemplate.rst
ClusterEnvironment
LightningEnvironment
Add LSF support (#5102) * add ClusterEnvironment for LSF systems * update init file * add available cluster environments * clean up LSFEnvironment * add ddp_hpc as a distributed backend * clean up SLURMEnvironment * remove extra blank line * init device for DDPHPCAccelerator We need to do this so we don't send the model to the same device from multiple ranks * committing current state * add additional methods to ClusterEnvironments * add NVIDIA mixin for setting up CUDA envars * remove troubleshooting prints * cleanup SLURMEnvironment * fix docstring * cleanup TorchElasticEnvironment and add documentation * PEP8 puts a cork in it * add set_ranks_to_trainer * remove unused import * move to new location * update LSF environment * remove mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * reset slurm env * add tests * add licence * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test node_rank * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lsf env to docs * add auto detection for lsf environment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix is_using_lsf() and test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-09 14:14:26 +00:00
LSFEnvironment
TorchElasticEnvironment
KubeflowEnvironment
SLURMEnvironment