lightning/docs/source-pytorch/accelerators/gpu_basic.rst

114 lines
4.5 KiB
ReStructuredText

:orphan:
.. _gpu_basic:
GPU training (Basic)
====================
**Audience:** Users looking to save money and run large models faster using single or multiple
----
What is a GPU?
--------------
A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed to speed up mathematical computations used in gaming and deep learning.
----
.. _multi_gpu:
Train on GPUs
-------------
The Trainer will run on all available GPUs by default. Make sure you're running on a machine with at least one GPU.
There's no need to specify any NVIDIA flags as Lightning will do it for you.
.. code-block:: python
# run on as many GPUs as available by default
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
# equivalent to
trainer = Trainer()
# run on one GPU
trainer = Trainer(accelerator="gpu", devices=1)
# run on multiple GPUs
trainer = Trainer(accelerator="gpu", devices=8)
# choose the number of devices automatically
trainer = Trainer(accelerator="gpu", devices="auto")
.. note::
Setting ``accelerator="gpu"`` will also automatically choose the "mps" device on Apple sillicon GPUs.
If you want to avoid this, you can set ``accelerator="cuda"`` instead.
Choosing GPU devices
^^^^^^^^^^^^^^^^^^^^
You can select the GPU devices using ranges, a list of indices or a string containing
a comma separated list of GPU ids:
.. testsetup::
k = 1
.. testcode::
:skipif: torch.cuda.device_count() < 2
# DEFAULT (int) specifies how many GPUs to use per node
Trainer(accelerator="gpu", devices=k)
# Above is equivalent to
Trainer(accelerator="gpu", devices=list(range(k)))
# Specify which GPUs to use (don't use when running on cluster)
Trainer(accelerator="gpu", devices=[0, 1])
# Equivalent using a string
Trainer(accelerator="gpu", devices="0, 1")
# To use all available GPUs put -1 or '-1'
# equivalent to `list(range(torch.cuda.device_count())) and `"auto"`
Trainer(accelerator="gpu", devices=-1)
The table below lists examples of possible input formats and how they are interpreted by Lightning.
+------------------+-----------+---------------------+---------------------------------+
| `devices` | Type | Parsed | Meaning |
+==================+===========+=====================+=================================+
| 3 | int | [0, 1, 2] | first 3 GPUs |
+------------------+-----------+---------------------+---------------------------------+
| -1 | int | [0, 1, 2, ...] | all available GPUs |
+------------------+-----------+---------------------+---------------------------------+
| [0] | list | [0] | GPU 0 |
+------------------+-----------+---------------------+---------------------------------+
| [1, 3] | list | [1, 3] | GPU index 1 and 3 (0-based) |
+------------------+-----------+---------------------+---------------------------------+
| "3" | str | [0, 1, 2] | first 3 GPUs |
+------------------+-----------+---------------------+---------------------------------+
| "1, 3" | str | [1, 3] | GPU index 1 and 3 (0-based) |
+------------------+-----------+---------------------+---------------------------------+
| "-1" | str | [0, 1, 2, ...] | all available GPUs |
+------------------+-----------+---------------------+---------------------------------+
Find usable CUDA devices
^^^^^^^^^^^^^^^^^^^^^^^^
If you want to run several experiments at the same time on your machine, for example for a hyperparameter sweep, then you can
use the following utility function to pick GPU indices that are "accessible", without having to change your code every time.
.. code-block:: python
from lightning.pytorch.accelerators import find_usable_cuda_devices
# Find two GPUs on the system that are not already occupied
trainer = Trainer(accelerator="cuda", devices=find_usable_cuda_devices(2))
from lightning.fabric.accelerators import find_usable_cuda_devices
# Works with Fabric too
fabric = Fabric(accelerator="cuda", devices=find_usable_cuda_devices(2))
This is especially useful when GPUs are configured to be in "exclusive compute mode", such that only one process at a time is allowed access to the device.
This special mode is often enabled on server GPUs or systems shared among multiple users.