lightning/docs/source-fabric/fundamentals/precision.rst

################################
Save memory with mixed precision
################################

.. video:: https://pl-public-data.s3.amazonaws.com/assets_lightning/fabric/animations/precision.mp4
    :width: 800
    :autoplay:
    :loop:
    :muted:
    :nocontrols:


************************
What is Mixed Precision?
************************

Like most deep learning frameworks, PyTorch runs on 32-bit floating-point (FP32) arithmetic by default.
However, many deep learning models do not require this to reach complete accuracy during training.
Mixed precision training delivers significant computational speedup by conducting operations in half-precision while keeping minimum information in single-precision to maintain as much information as possible in crucial areas of the network.
Switching to mixed precision has resulted in considerable training speedups since the introduction of Tensor Cores in the Volta and Turing architectures.
It combines FP32 and lower-bit floating points (such as FP16) to reduce memory footprint and increase performance during model training and evaluation.
It accomplishes this by recognizing the steps that require complete accuracy and employing a 32-bit floating point for those steps only while using a 16-bit floating point for the rest.
Compared to complete precision training, mixed precision training delivers all these benefits while ensuring no task-specific accuracy is lost `[1] <https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html>`_.

This is how you select the precision in Fabric:

.. code-block:: python

    from lightning.fabric import Fabric

    # This is the default
    fabric = Fabric(precision="32-true")

    # Also FP32 (legacy)
    fabric = Fabric(precision=32)

    # FP32 as well (legacy)
    fabric = Fabric(precision="32")

    # Float16 mixed precision
    fabric = Fabric(precision="16-mixed")

    # Float16 true half precision
    fabric = Fabric(precision="16-true")

    # BFloat16 mixed precision (Volta GPUs and later)
    fabric = Fabric(precision="bf16-mixed")

    # BFloat16 true half precision (Volta GPUs and later)
    fabric = Fabric(precision="bf16-true")

    # 8-bit mixed precision via TransformerEngine (Hopper GPUs and later)
    fabric = Fabric(precision="transformer-engine")

    # Double precision
    fabric = Fabric(precision="64-true")

    # Or (legacy)
    fabric = Fabric(precision="64")

    # Or (legacy)
    fabric = Fabric(precision=64)


The same values can also be set through the :doc:`command line interface <launch>`:

.. code-block:: bash

    lightning run model train.py --precision=bf16-mixed


.. note::

    In some cases, it is essential to remain in FP32 for numerical stability, so keep this in mind when using mixed precision.
    For example, when running scatter operations during the forward (such as torchpoint3d), the computation must remain in FP32.


----


********************
FP16 Mixed Precision
********************

In most cases, mixed precision uses FP16.
Supported `PyTorch operations <https://pytorch.org/docs/stable/amp.html#op-specific-behavior>`_ automatically run in FP16, saving memory and improving throughput on the supported accelerators.
Since computation happens in FP16, which has a very limited "dynamic range", there is a chance of numerical instability during training.
This is handled internally by a dynamic grad scaler which skips invalid steps and adjusts the scaler to ensure subsequent steps fall within a finite range.
For more information `see the autocast docs <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`_.

This is how you enable FP16 in Fabric:

.. code-block:: python

    # Select FP16 mixed precision
    fabric = Fabric(precision="16-mixed")

.. note::

    When using TPUs, setting ``precision="16-mixed"`` will enable bfloat16 based mixed precision, the only supported half-precision type on TPUs.


----


************************
BFloat16 Mixed Precision
************************

BFloat16 Mixed precision is similar to FP16 mixed precision. However, it maintains more of the "dynamic range" that FP32 offers.
This means it can improve numerical stability than FP16 mixed precision.
For more information, see `this TPU performance blog post <https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus>`_.

.. code-block:: python

    # Select BF16 precision
    fabric = Fabric(precision="bf16-mixed")


Under the hood, we use `torch.autocast <https://pytorch.org/docs/stable/amp.html>`__ with the dtype set to ``bfloat16``, with no gradient scaling.
It is also possible to use BFloat16 mixed precision on the CPU, relying on MKLDNN.

.. note::

    BFloat16 may not provide significant speedups or memory improvements, offering better numerical stability.
    For GPUs, the most significant benefits require `Ampere <https://en.wikipedia.org/wiki/Ampere_(microarchitecture)>`_ based GPUs or newer, such as A100s or 3090s.


----


*****************************************************
Float8 Mixed Precision via Nvidia's TransformerEngine
*****************************************************

`Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`__ (TE) is a library for accelerating models on the
latest NVIDIA GPUs using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower
memory utilization in both training and inference. It offers improved performance over half precision with no degradation in accuracy.

Using TE requires replacing some of the layers in your model. Fabric automatically replaces the :class:`torch.nn.Linear`
and :class:`torch.nn.LayerNorm` layers in your model with their TE alternatives, however, TE also offers
`fused layers <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/pytorch.html>`__
to squeeze out all the possible performance. If Fabric detects that any layer has been replaced already, automatic
replacement is not done.

This plugin is a mix of "mixed" and "true" precision. The computation is downcasted to FP8 precision on the fly, but
the model and inputs can be kept in true full or half precision.

.. code-block:: python

    # Select 8bit mixed precision via TransformerEngine
    fabric = Fabric(precision="transformer-engine")

    # Customize the fp8 recipe or set a different base precision:
    from lightning.fabric.plugins.precision import TransformerEnginePrecision

    recipe = {"fp8_format": "HYBRID", "amax_history_len": 16, "amax_compute_algo": "max"}
    precision = TransformerEnginePrecision(dtype=torch.bfloat16, recipe=recipe)
    fabric = Fabric(plugins=precision)


Under the hood, we use `transformer_engine.pytorch.fp8_autocast <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/pytorch.html#transformer_engine.pytorch.fp8_autocast>`__ with the default fp8 recipe.

.. note::

    This requires `Hopper <https://en.wikipedia.org/wiki/Hopper_(microarchitecture)>`_ based GPUs or newer, such the H100.


----


*******************
True Half Precision
*******************

As mentioned before, for numerical stability mixed precision keeps the model weights in full float32 precision while casting only supported operations to lower bit precision.
However, in some cases it is indeed possible to train completely in half precision. Similarly, for inference the model weights can often be cast to half precision without a loss in accuracy (even when trained with mixed precision).

.. code-block:: python

    # Select FP16 precision
    fabric = Fabric(precision="16-true")
    model = MyModel()
    model = fabric.setup(model)  # model gets cast to torch.float16

    # Select BF16 precision
    fabric = Fabric(precision="bf16-true")
    model = MyModel()
    model = fabric.setup(model)  # model gets cast to torch.bfloat16

Tip: For faster initialization, you can create model parameters with the desired dtype directly on the device:

.. code-block:: python

    fabric = Fabric(precision="bf16-true")

    # init the model directly on the device and with parameters in half-precision
    with fabric.init_module():
        model = MyModel()

    model = fabric.setup(model)


----


************************************
Control where precision gets applied
************************************

Fabric automatically casts the data type and operations in the ``forward`` of your model:

.. code-block:: python

    fabric = Fabric(precision="bf16-mixed")

    model = ...
    optimizer = ...

    # Here, Fabric sets up the `model.forward` for precision auto-casting
    model, optimizer = fabric.setup(model, optimizer)

    # Precision casting gets handled in your forward, no code changes required
    output = model.forward(input)

    # Precision does NOT get applied here (only in forward)
    loss = loss_function(output, target)

If you want to enable operations in lower bit-precision **outside** your model's ``forward()``, you can use the :meth:`~lightning.fabric.fabric.Fabric.autocast` context manager:

.. code-block:: python

    # Precision now gets also handled in this part of the code:
    with fabric.autocast():
        loss = loss_function(output, target)
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`################################`
			`Save memory with mixed precision`
			`################################`

docs: fetch external sources (#17941) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2023-07-03 18:16:45 +00:00			`.. video:: https://pl-public-data.s3.amazonaws.com/assets_lightning/fabric/animations/precision.mp4`
			`:width: 800`
			`:autoplay:`
			`:loop:`
			`:muted:`
Disable video controls in Fabric teaser animations (#17984) 2023-07-04 11:27:54 +00:00			`:nocontrols:`

Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`************************`
			`What is Mixed Precision?`
			`************************`

Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`Like most deep learning frameworks, PyTorch runs on 32-bit floating-point (FP32) arithmetic by default.`
			`However, many deep learning models do not require this to reach complete accuracy during training.`
Grammar corrections for Fabric docs (#16494) 2023-01-25 10:45:09 +00:00			`Mixed precision training delivers significant computational speedup by conducting operations in half-precision while keeping minimum information in single-precision to maintain as much information as possible in crucial areas of the network.`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`Switching to mixed precision has resulted in considerable training speedups since the introduction of Tensor Cores in the Volta and Turing architectures.`
Grammar corrections for Fabric docs (#16494) 2023-01-25 10:45:09 +00:00			`It combines FP32 and lower-bit floating points (such as FP16) to reduce memory footprint and increase performance during model training and evaluation.`
			`It accomplishes this by recognizing the steps that require complete accuracy and employing a 32-bit floating point for those steps only while using a 16-bit floating point for the rest.`
Miscellaneous updates in Fabric docs (#16980) 2023-03-07 15:43:47 +00:00			Compared to complete precision training, mixed precision training delivers all these benefits while ensuring no task-specific accuracy is lost `[1] <https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html>`_.
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`This is how you select the precision in Fabric:`

Multi-node documentation for Fabric (#16495) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> 2023-01-25 22:07:09 +00:00			`.. code-block:: python`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`from lightning.fabric import Fabric`

			`# This is the default`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="32-true")`

Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# Also FP32 (legacy)`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`fabric = Fabric(precision=32)`

Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# FP32 as well (legacy)`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="32")`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# Float16 mixed precision`
docs: rename source-app (#16863) * docs: rename source-app * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci * group check * trigger * param * fix * cleaning --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2023-02-28 09:04:43 +00:00			`fabric = Fabric(precision="16-mixed")`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# Float16 true half precision`
			`fabric = Fabric(precision="16-true")`

			`# BFloat16 mixed precision (Volta GPUs and later)`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="bf16-mixed")`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# BFloat16 true half precision (Volta GPUs and later)`
			`fabric = Fabric(precision="bf16-true")`

			`# 8-bit mixed precision via TransformerEngine (Hopper GPUs and later)`
			`fabric = Fabric(precision="transformer-engine")`

Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`# Double precision`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="64-true")`

Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# Or (legacy)`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="64")`

Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`# Or (legacy)`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`fabric = Fabric(precision=64)`


			The same values can also be set through the :doc:`command line interface <launch>`:

			`.. code-block:: bash`

Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`lightning run model train.py --precision=bf16-mixed`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00

			`.. note::`

			`In some cases, it is essential to remain in FP32 for numerical stability, so keep this in mind when using mixed precision.`
Grammar corrections for Fabric docs (#16494) 2023-01-25 10:45:09 +00:00			`For example, when running scatter operations during the forward (such as torchpoint3d), the computation must remain in FP32.`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00

			`----`


			`********************`
			`FP16 Mixed Precision`
			`********************`

			`In most cases, mixed precision uses FP16.`
			Supported `PyTorch operations <https://pytorch.org/docs/stable/amp.html#op-specific-behavior>`_ automatically run in FP16, saving memory and improving throughput on the supported accelerators.
Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			`Since computation happens in FP16, which has a very limited "dynamic range", there is a chance of numerical instability during training.`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`This is handled internally by a dynamic grad scaler which skips invalid steps and adjusts the scaler to ensure subsequent steps fall within a finite range.`
			For more information `see the autocast docs <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`_.

			`This is how you enable FP16 in Fabric:`

Multi-node documentation for Fabric (#16495) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> 2023-01-25 22:07:09 +00:00			`.. code-block:: python`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`# Select FP16 mixed precision`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="16-mixed")`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`.. note::`

Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			When using TPUs, setting ``precision="16-mixed"`` will enable bfloat16 based mixed precision, the only supported half-precision type on TPUs.
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00

			`----`


			`************************`
			`BFloat16 Mixed Precision`
			`************************`

Grammar corrections for Fabric docs (#16494) 2023-01-25 10:45:09 +00:00			`BFloat16 Mixed precision is similar to FP16 mixed precision. However, it maintains more of the "dynamic range" that FP32 offers.`
			`This means it can improve numerical stability than FP16 mixed precision.`
			For more information, see `this TPU performance blog post <https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus>`_.
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Multi-node documentation for Fabric (#16495) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> 2023-01-25 22:07:09 +00:00			`.. code-block:: python`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`# Select BF16 precision`
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="bf16-mixed")`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00

			Under the hood, we use `torch.autocast <https://pytorch.org/docs/stable/amp.html>`__ with the dtype set to ``bfloat16``, with no gradient scaling.
			`It is also possible to use BFloat16 mixed precision on the CPU, relying on MKLDNN.`

			`.. note::`

Review APIs experimental status (#17012) 2023-03-09 16:56:49 +00:00			`BFloat16 may not provide significant speedups or memory improvements, offering better numerical stability.`
Support NVIDIA's Transformer Engine as a precision plugin (#17597) 2023-07-19 16:21:58 +00:00			For GPUs, the most significant benefits require `Ampere <https://en.wikipedia.org/wiki/Ampere_(microarchitecture)>`_ based GPUs or newer, such as A100s or 3090s.


			`----`


			`*****************************************************`
			`Float8 Mixed Precision via Nvidia's TransformerEngine`
			`*****************************************************`

			`Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`__ (TE) is a library for accelerating models on the
			`latest NVIDIA GPUs using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower`
			`memory utilization in both training and inference. It offers improved performance over half precision with no degradation in accuracy.`

			Using TE requires replacing some of the layers in your model. Fabric automatically replaces the :class:`torch.nn.Linear`
			and :class:`torch.nn.LayerNorm` layers in your model with their TE alternatives, however, TE also offers
			`fused layers <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/pytorch.html>`__
			`to squeeze out all the possible performance. If Fabric detects that any layer has been replaced already, automatic`
			`replacement is not done.`

			`This plugin is a mix of "mixed" and "true" precision. The computation is downcasted to FP8 precision on the fly, but`
			`the model and inputs can be kept in true full or half precision.`

			`.. code-block:: python`

			`# Select 8bit mixed precision via TransformerEngine`
			`fabric = Fabric(precision="transformer-engine")`

			`# Customize the fp8 recipe or set a different base precision:`
			`from lightning.fabric.plugins.precision import TransformerEnginePrecision`

			`recipe = {"fp8_format": "HYBRID", "amax_history_len": 16, "amax_compute_algo": "max"}`
			`precision = TransformerEnginePrecision(dtype=torch.bfloat16, recipe=recipe)`
			`fabric = Fabric(plugins=precision)`


			Under the hood, we use `transformer_engine.pytorch.fp8_autocast <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/pytorch.html#transformer_engine.pytorch.fp8_autocast>`__ with the default fp8 recipe.

			`.. note::`

			This requires `Hopper <https://en.wikipedia.org/wiki/Hopper_(microarchitecture)>`_ based GPUs or newer, such the H100.
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00

			`----`


True half-precision support in Fabric (#17287) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> 2023-04-27 12:37:33 +00:00			`*******************`
			`True Half Precision`
			`*******************`

			`As mentioned before, for numerical stability mixed precision keeps the model weights in full float32 precision while casting only supported operations to lower bit precision.`
			`However, in some cases it is indeed possible to train completely in half precision. Similarly, for inference the model weights can often be cast to half precision without a loss in accuracy (even when trained with mixed precision).`

			`.. code-block:: python`

			`# Select FP16 precision`
			`fabric = Fabric(precision="16-true")`
			`model = MyModel()`
			`model = fabric.setup(model) # model gets cast to torch.float16`

			`# Select BF16 precision`
			`fabric = Fabric(precision="bf16-true")`
			`model = MyModel()`
			`model = fabric.setup(model) # model gets cast to torch.bfloat16`

			`Tip: For faster initialization, you can create model parameters with the desired dtype directly on the device:`

			`.. code-block:: python`

			`fabric = Fabric(precision="bf16-true")`

			`# init the model directly on the device and with parameters in half-precision`
			`with fabric.init_module():`
			`model = MyModel()`

			`model = fabric.setup(model)`


			`----`


Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`************************************`
			`Control where precision gets applied`
			`************************************`

			Fabric automatically casts the data type and operations in the ``forward`` of your model:

Multi-node documentation for Fabric (#16495) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> 2023-01-25 22:07:09 +00:00			`.. code-block:: python`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Introduce new precision layout in fabric (#16767) 2023-02-17 10:41:18 +00:00			`fabric = Fabric(precision="bf16-mixed")`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
			`model = ...`
			`optimizer = ...`

			# Here, Fabric sets up the `model.forward` for precision auto-casting
			`model, optimizer = fabric.setup(model, optimizer)`

			`# Precision casting gets handled in your forward, no code changes required`
			`output = model.forward(input)`

			`# Precision does NOT get applied here (only in forward)`
			`loss = loss_function(output, target)`

docs: update `pytorch_lightning` imports (#16864) * update docs imports * ci * fabric * trigger * links * . * docstring * chlog * cleaning 2023-02-27 20:14:23 +00:00			If you want to enable operations in lower bit-precision outside your model's ``forward()``, you can use the :meth:`~lightning.fabric.fabric.Fabric.autocast` context manager:
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Multi-node documentation for Fabric (#16495) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> 2023-01-25 22:07:09 +00:00			`.. code-block:: python`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00
Grammar corrections for Fabric docs (#16494) 2023-01-25 10:45:09 +00:00			`# Precision now gets also handled in this part of the code:`
Update Lightning Lite docs (6/n) (#16342) 2023-01-12 13:37:24 +00:00			`with fabric.autocast():`
			`loss = loss_function(output, target)`