144 lines
5.6 KiB
ReStructuredText
144 lines
5.6 KiB
ReStructuredText
:orphan:
|
|
|
|
.. _ipu_advanced:
|
|
|
|
Accelerator: IPU training
|
|
=========================
|
|
**Audience:** Users looking to customize IPU training for massive models.
|
|
|
|
----
|
|
|
|
Advanced IPU options
|
|
--------------------
|
|
|
|
IPUs provide further optimizations to speed up training. By using the ``IPUStrategy`` we can set the ``device_iterations``, which controls the number of iterations run directly on the IPU devices before returning to the host. Increasing the number of on-device iterations will improve throughput, as there is less device to host communication required.
|
|
|
|
.. note::
|
|
|
|
When using model parallelism, it is a hard requirement to increase the number of device iterations to ensure we fully saturate the devices via micro-batching. see :ref:`ipu-model-parallelism` for more information.
|
|
|
|
.. code-block:: python
|
|
|
|
import pytorch_lightning as pl
|
|
from pytorch_lightning.strategies import IPUStrategy
|
|
|
|
model = MyLightningModule()
|
|
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=32))
|
|
trainer.fit(model)
|
|
|
|
Note that by default we return the last device iteration loss. You can override this by passing in your own ``poptorch.Options`` and setting the AnchorMode as described in the `PopTorch documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.Options.anchorMode>`__.
|
|
|
|
.. code-block:: python
|
|
|
|
import poptorch
|
|
import pytorch_lightning as pl
|
|
from pytorch_lightning.strategies import IPUStrategy
|
|
|
|
model = MyLightningModule()
|
|
inference_opts = poptorch.Options()
|
|
inference_opts.deviceIterations(32)
|
|
|
|
training_opts = poptorch.Options()
|
|
training_opts.anchorMode(poptorch.AnchorMode.All)
|
|
training_opts.deviceIterations(32)
|
|
|
|
trainer = Trainer(
|
|
accelerator="ipu", devices=8, strategy=IPUStrategy(inference_opts=inference_opts, training_opts=training_opts)
|
|
)
|
|
trainer.fit(model)
|
|
|
|
You can also override all options by passing the ``poptorch.Options`` to the plugin. See `PopTorch options documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html>`__ for more information.
|
|
|
|
----
|
|
|
|
.. _ipu-model-parallelism:
|
|
|
|
Model parallelism
|
|
-----------------
|
|
|
|
Due to the IPU architecture, larger models should be parallelized across IPUs by design. Currently PopTorch provides the capabilities via annotations as described in `parallel execution strategies <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/overview.html#execution-strategies>`__.
|
|
|
|
Below is an example using the block annotation in a LightningModule.
|
|
|
|
.. note::
|
|
|
|
Currently, when using model parallelism we do not infer the number of IPUs required for you. This is done via the annotations themselves. If you specify 4 different IDs when defining Blocks, this means your model will be split onto 4 different IPUs.
|
|
|
|
This is also mutually exclusive with the Trainer flag. In other words, if your model is split onto 2 IPUs and you set ``Trainer(accelerator="ipu", devices=4)`` this will require 8 IPUs in total: data parallelism will be used to replicate the two-IPU model 4 times.
|
|
|
|
When pipelining the model you must also increase the `device_iterations` to ensure full data saturation of the devices data, i.e whilst one device in the model pipeline processes a batch of data, the other device can start on the next batch. For example if the model is split onto 4 IPUs, we require `device_iterations` to be at-least 4.
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
import pytorch_lightning as pl
|
|
import poptorch
|
|
|
|
|
|
class MyLightningModule(pl.LightningModule):
|
|
def __init__(self):
|
|
super().__init__()
|
|
# This will place layer1, layer2+layer3, layer4, softmax on different IPUs at runtime.
|
|
# BeginBlock will start a new id for all layers within this block
|
|
self.layer1 = poptorch.BeginBlock(torch.nn.Linear(5, 10), ipu_id=0)
|
|
|
|
# This layer starts a new block,
|
|
# adding subsequent layers to this current block at runtime
|
|
# till the next block has been declared
|
|
self.layer2 = poptorch.BeginBlock(torch.nn.Linear(10, 5), ipu_id=1)
|
|
self.layer3 = torch.nn.Linear(5, 5)
|
|
|
|
# Create new blocks
|
|
self.layer4 = poptorch.BeginBlock(torch.nn.Linear(5, 5), ipu_id=2)
|
|
self.softmax = poptorch.BeginBlock(torch.nn.Softmax(dim=1), ipu_id=3)
|
|
|
|
...
|
|
|
|
|
|
model = MyLightningModule()
|
|
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=20))
|
|
trainer.fit(model)
|
|
|
|
|
|
You can also use the block context manager within the forward function, or any of the step functions.
|
|
|
|
.. code-block:: python
|
|
|
|
import pytorch_lightning as pl
|
|
import poptorch
|
|
|
|
|
|
class MyLightningModule(pl.LightningModule):
|
|
def __init__(self):
|
|
super().__init__()
|
|
self.layer1 = torch.nn.Linear(5, 10)
|
|
self.layer2 = torch.nn.Linear(10, 5)
|
|
self.layer3 = torch.nn.Linear(5, 5)
|
|
self.layer4 = torch.nn.Linear(5, 5)
|
|
|
|
self.act = torch.nn.ReLU()
|
|
self.softmax = torch.nn.Softmax(dim=1)
|
|
|
|
def forward(self, x):
|
|
|
|
with poptorch.Block(ipu_id=0):
|
|
x = self.act(self.layer1(x))
|
|
|
|
with poptorch.Block(ipu_id=1):
|
|
x = self.act(self.layer2(x))
|
|
|
|
with poptorch.Block(ipu_id=2):
|
|
x = self.act(self.layer3(x))
|
|
x = self.act(self.layer4(x))
|
|
|
|
with poptorch.Block(ipu_id=3):
|
|
x = self.softmax(x)
|
|
return x
|
|
|
|
...
|
|
|
|
|
|
model = MyLightningModule()
|
|
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=20))
|
|
trainer.fit(model)
|