From d1f8b0f7669aa808df5fcc2934455be3c83b4bae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Carlos=20Mochol=C3=AD?= <carlossmocholi@gmail.com>
Date: Sat, 30 Sep 2023 16:19:11 +0200
Subject: [PATCH] Bitsandbytes docs improvements (#18681)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
---
 docs/source-fabric/fundamentals/precision.rst    | 16 ++++++++--------
 .../common/precision_intermediate.rst            | 15 ++++++++-------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/docs/source-fabric/fundamentals/precision.rst b/docs/source-fabric/fundamentals/precision.rst
index 2ec2ff5d73..5c57eee835 100644
--- a/docs/source-fabric/fundamentals/precision.rst
+++ b/docs/source-fabric/fundamentals/precision.rst
@@ -218,20 +218,20 @@ Quantization via Bitsandbytes
 
 Both 4-bit (`paper reference <https://arxiv.org/abs/2305.14314v1>`__) and 8-bit (`paper reference <https://arxiv.org/abs/2110.02861>`__) quantization is supported.
 Specifically, we support the following modes:
-* nf4: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
-* nf4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* fp4: Uses regular float 4-bit data type.
-* fp4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* int8: Uses unsigned int8 data type.
-* int8-training: Meant for int8 activations with fp16 precision weights.
+
+* **nf4**: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
+* **nf4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **fp4**: Uses regular float 4-bit data type.
+* **fp4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **int8**: Uses unsigned int8 data type.
+* **int8-training**: Meant for int8 activations with fp16 precision weights.
 
 While these techniques store weights in 4 or 8 bit, the computation still happens in 16 or 32-bit (float16, bfloat16, float32).
 This is configurable via the dtype argument in the plugin.
 
 Quantizing the model will dramatically reduce the weight's memory requirements but may have a negative impact on the model's performance or runtime.
 
-Fabric automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.
-
+The :class:`~lightning.fabric.plugins.precision.bitsandbytes.BitsandbytesPrecision` a
 .. code-block:: python
 
     from lightning.fabric.plugins import BitsandbytesPrecision
diff --git a/docs/source-pytorch/common/precision_intermediate.rst b/docs/source-pytorch/common/precision_intermediate.rst
index 3c190f1b04..bfa957b498 100644
--- a/docs/source-pytorch/common/precision_intermediate.rst
+++ b/docs/source-pytorch/common/precision_intermediate.rst
@@ -169,19 +169,20 @@ Quantization via Bitsandbytes
 
 Both 4-bit (`paper reference <https://arxiv.org/abs/2305.14314v1>`__) and 8-bit (`paper reference <https://arxiv.org/abs/2110.02861>`__) quantization is supported.
 Specifically, we support the following modes:
-* nf4: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
-* nf4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* fp4: Uses regular float 4-bit data type.
-* fp4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* int8: Uses unsigned int8 data type.
-* int8-training: Meant for int8 activations with fp16 precision weights.
+
+* **nf4**: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
+* **nf4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **fp4**: Uses regular float 4-bit data type.
+* **fp4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **int8**: Uses unsigned int8 data type.
+* **int8-training**: Meant for int8 activations with fp16 precision weights.
 
 While these techniques store weights in 4 or 8 bit, the computation still happens in 16 or 32-bit (float16, bfloat16, float32).
 This is configurable via the dtype argument in the plugin.
 
 Quantizing the model will dramatically reduce the weight's memory requirements but  may have a negative impact on the model's performance or runtime.
 
-The Trainer automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.
+The :class:`~lightning.pytorch.plugins.precision.bitsandbytes.BitsandbytesPrecisionPlugin` automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.
 
 .. code-block:: python