2020-05-05 02:16:54 +00:00
|
|
|
.. testsetup:: *
|
|
|
|
|
|
|
|
from pytorch_lightning.trainer.trainer import Trainer
|
|
|
|
|
2020-09-14 01:04:21 +00:00
|
|
|
.. _amp:
|
2020-05-05 02:16:54 +00:00
|
|
|
|
2020-02-11 04:55:22 +00:00
|
|
|
16-bit training
|
|
|
|
=================
|
2020-10-13 13:52:32 +00:00
|
|
|
Lightning offers 16-bit training for CPUs, GPUs, and TPUs.
|
2020-02-17 21:01:20 +00:00
|
|
|
|
2020-10-08 09:49:56 +00:00
|
|
|
.. raw:: html
|
|
|
|
|
2020-10-08 19:54:52 +00:00
|
|
|
<video width="50%" max-width="400px" controls
|
2020-10-08 09:49:56 +00:00
|
|
|
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/yt_thumbs/thumb_precision.png"
|
|
|
|
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/yt/Trainer+flags+9+-+precision_1.mp4"></video>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2020-06-19 06:38:10 +00:00
|
|
|
----------
|
|
|
|
|
2020-02-17 21:01:20 +00:00
|
|
|
GPU 16-bit
|
2020-06-18 21:54:29 +00:00
|
|
|
----------
|
2020-10-13 13:52:32 +00:00
|
|
|
16-bit precision can cut your memory footprint by half.
|
2020-06-02 22:50:08 +00:00
|
|
|
If using volta architecture GPUs it can give a dramatic training speed-up as well.
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-06-17 14:53:48 +00:00
|
|
|
.. note:: PyTorch 1.6+ is recommended for 16-bit
|
|
|
|
|
|
|
|
Native torch
|
|
|
|
^^^^^^^^^^^^
|
|
|
|
When using PyTorch 1.6+ Lightning uses the native amp implementation to support 16-bit.
|
|
|
|
|
|
|
|
.. testcode::
|
2020-06-28 03:31:06 +00:00
|
|
|
:skipif: not APEX_AVAILABLE and not NATIVE_AMP_AVALAIBLE
|
2020-06-17 14:53:48 +00:00
|
|
|
|
|
|
|
# turn on 16-bit
|
|
|
|
trainer = Trainer(precision=16)
|
|
|
|
|
|
|
|
Apex 16-bit
|
|
|
|
^^^^^^^^^^^
|
2020-06-17 23:50:19 +00:00
|
|
|
If you are using an earlier version of PyTorch Lightning uses Apex to support 16-bit.
|
2020-06-17 14:53:48 +00:00
|
|
|
|
|
|
|
Follow these instructions to install Apex.
|
2020-02-11 04:55:22 +00:00
|
|
|
To use 16-bit precision, do two things:
|
2020-02-17 21:01:20 +00:00
|
|
|
|
2020-02-11 04:55:22 +00:00
|
|
|
1. Install Apex
|
2020-02-17 21:01:20 +00:00
|
|
|
2. Set the "precision" trainer flag.
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
$ git clone https://github.com/NVIDIA/apex
|
|
|
|
$ cd apex
|
|
|
|
|
|
|
|
# ------------------------
|
2020-10-13 13:52:32 +00:00
|
|
|
# OPTIONAL: on your cluster you might need to load CUDA 10 or 9
|
2020-02-11 04:55:22 +00:00
|
|
|
# depending on how you installed PyTorch
|
|
|
|
|
|
|
|
# see available modules
|
|
|
|
module avail
|
|
|
|
|
2020-10-13 13:52:32 +00:00
|
|
|
# load correct CUDA before install
|
2020-02-11 04:55:22 +00:00
|
|
|
module load cuda-10.0
|
|
|
|
# ------------------------
|
|
|
|
|
|
|
|
# make sure you've loaded a cuda version > 4.0 and < 7.0
|
|
|
|
module load gcc-6.1.0
|
|
|
|
|
|
|
|
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
|
|
|
|
|
2020-06-17 14:53:48 +00:00
|
|
|
.. warning:: NVIDIA Apex and DDP have instability problems. We recommend native 16-bit in PyTorch 1.6+
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Enable 16-bit
|
2020-02-17 21:01:20 +00:00
|
|
|
^^^^^^^^^^^^^
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-06-28 03:31:06 +00:00
|
|
|
:skipif: not APEX_AVAILABLE and not NATIVE_AMP_AVALAIBLE
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-02-17 21:01:20 +00:00
|
|
|
# turn on 16-bit
|
2020-06-17 14:53:48 +00:00
|
|
|
trainer = Trainer(amp_level='O2', precision=16)
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
If you need to configure the apex init for your particular use case or want to use a different way of doing
|
2020-02-17 21:01:20 +00:00
|
|
|
16-bit training, override :meth:`pytorch_lightning.core.LightningModule.configure_apex`.
|
|
|
|
|
2020-06-19 06:38:10 +00:00
|
|
|
----------
|
|
|
|
|
2020-02-17 21:01:20 +00:00
|
|
|
TPU 16-bit
|
|
|
|
----------
|
2020-10-13 13:52:32 +00:00
|
|
|
16-bit on TPUs is much simpler. To use 16-bit with TPUs set precision to 16 when using the TPU flag
|
2020-02-17 21:01:20 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-06-28 03:31:06 +00:00
|
|
|
:skipif: not XLA_AVAILABLE
|
2020-02-17 21:01:20 +00:00
|
|
|
|
|
|
|
# DEFAULT
|
2020-05-17 20:30:54 +00:00
|
|
|
trainer = Trainer(tpu_cores=8, precision=32)
|
2020-02-17 21:01:20 +00:00
|
|
|
|
|
|
|
# turn on 16-bit
|
2020-05-17 20:30:54 +00:00
|
|
|
trainer = Trainer(tpu_cores=8, precision=16)
|