lightning/docs/source-pytorch/accelerators/ipu_advanced.rst

144 lines
5.6 KiB
ReStructuredText

:orphan:
.. _ipu_advanced:
Accelerator: IPU training
=========================
**Audience:** Users looking to customize IPU training for massive models.
----
Advanced IPU options
--------------------
IPUs provide further optimizations to speed up training. By using the ``IPUStrategy`` we can set the ``device_iterations``, which controls the number of iterations run directly on the IPU devices before returning to the host. Increasing the number of on-device iterations will improve throughput, as there is less device to host communication required.
.. note::
When using model parallelism, it is a hard requirement to increase the number of device iterations to ensure we fully saturate the devices via micro-batching. see :ref:`ipu-model-parallelism` for more information.
.. code-block:: python
import pytorch_lightning as pl
from pytorch_lightning.strategies import IPUStrategy
model = MyLightningModule()
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=32))
trainer.fit(model)
Note that by default we return the last device iteration loss. You can override this by passing in your own ``poptorch.Options`` and setting the AnchorMode as described in the `PopTorch documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.Options.anchorMode>`__.
.. code-block:: python
import poptorch
import pytorch_lightning as pl
from pytorch_lightning.strategies import IPUStrategy
model = MyLightningModule()
inference_opts = poptorch.Options()
inference_opts.deviceIterations(32)
training_opts = poptorch.Options()
training_opts.anchorMode(poptorch.AnchorMode.All)
training_opts.deviceIterations(32)
trainer = Trainer(
accelerator="ipu", devices=8, strategy=IPUStrategy(inference_opts=inference_opts, training_opts=training_opts)
)
trainer.fit(model)
You can also override all options by passing the ``poptorch.Options`` to the plugin. See `PopTorch options documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html>`__ for more information.
----
.. _ipu-model-parallelism:
Model parallelism
-----------------
Due to the IPU architecture, larger models should be parallelized across IPUs by design. Currently PopTorch provides the capabilities via annotations as described in `parallel execution strategies <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/overview.html#execution-strategies>`__.
Below is an example using the block annotation in a LightningModule.
.. note::
Currently, when using model parallelism we do not infer the number of IPUs required for you. This is done via the annotations themselves. If you specify 4 different IDs when defining Blocks, this means your model will be split onto 4 different IPUs.
This is also mutually exclusive with the Trainer flag. In other words, if your model is split onto 2 IPUs and you set ``Trainer(accelerator="ipu", devices=4)`` this will require 8 IPUs in total: data parallelism will be used to replicate the two-IPU model 4 times.
When pipelining the model you must also increase the `device_iterations` to ensure full data saturation of the devices data, i.e whilst one device in the model pipeline processes a batch of data, the other device can start on the next batch. For example if the model is split onto 4 IPUs, we require `device_iterations` to be at-least 4.
.. code-block:: python
import pytorch_lightning as pl
import poptorch
class MyLightningModule(pl.LightningModule):
def __init__(self):
super().__init__()
# This will place layer1, layer2+layer3, layer4, softmax on different IPUs at runtime.
# BeginBlock will start a new id for all layers within this block
self.layer1 = poptorch.BeginBlock(torch.nn.Linear(5, 10), ipu_id=0)
# This layer starts a new block,
# adding subsequent layers to this current block at runtime
# till the next block has been declared
self.layer2 = poptorch.BeginBlock(torch.nn.Linear(10, 5), ipu_id=1)
self.layer3 = torch.nn.Linear(5, 5)
# Create new blocks
self.layer4 = poptorch.BeginBlock(torch.nn.Linear(5, 5), ipu_id=2)
self.softmax = poptorch.BeginBlock(torch.nn.Softmax(dim=1), ipu_id=3)
...
model = MyLightningModule()
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=20))
trainer.fit(model)
You can also use the block context manager within the forward function, or any of the step functions.
.. code-block:: python
import pytorch_lightning as pl
import poptorch
class MyLightningModule(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer1 = torch.nn.Linear(5, 10)
self.layer2 = torch.nn.Linear(10, 5)
self.layer3 = torch.nn.Linear(5, 5)
self.layer4 = torch.nn.Linear(5, 5)
self.act = torch.nn.ReLU()
self.softmax = torch.nn.Softmax(dim=1)
def forward(self, x):
with poptorch.Block(ipu_id=0):
x = self.act(self.layer1(x))
with poptorch.Block(ipu_id=1):
x = self.act(self.layer2(x))
with poptorch.Block(ipu_id=2):
x = self.act(self.layer3(x))
x = self.act(self.layer4(x))
with poptorch.Block(ipu_id=3):
x = self.softmax(x)
return x
...
model = MyLightningModule()
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=20))
trainer.fit(model)