Bitsandbytes docs improvements (#18681)

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-09-30 16:19:11 +02:00 · 2023-09-30 16:19:11 +02:00 · d1f8b0f766
parent df959aeb4f
commit d1f8b0f766
2 changed files with 16 additions and 15 deletions
--- a/docs/source-fabric/fundamentals/precision.rst
+++ b/docs/source-fabric/fundamentals/precision.rst
@ -218,20 +218,20 @@ Quantization via Bitsandbytes

 Both 4-bit (`paper reference <https://arxiv.org/abs/2305.14314v1>`__) and 8-bit (`paper reference <https://arxiv.org/abs/2110.02861>`__) quantization is supported.
 Specifically, we support the following modes:
-* nf4: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
-* nf4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* fp4: Uses regular float 4-bit data type.
-* fp4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* int8: Uses unsigned int8 data type.
-* int8-training: Meant for int8 activations with fp16 precision weights.
+
+* **nf4**: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
+* **nf4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **fp4**: Uses regular float 4-bit data type.
+* **fp4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **int8**: Uses unsigned int8 data type.
+* **int8-training**: Meant for int8 activations with fp16 precision weights.

 While these techniques store weights in 4 or 8 bit, the computation still happens in 16 or 32-bit (float16, bfloat16, float32).
 This is configurable via the dtype argument in the plugin.

 Quantizing the model will dramatically reduce the weight's memory requirements but may have a negative impact on the model's performance or runtime.

-Fabric automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.
-
+The :class:`~lightning.fabric.plugins.precision.bitsandbytes.BitsandbytesPrecision` a
 .. code-block:: python

    from lightning.fabric.plugins import BitsandbytesPrecision
--- a/docs/source-pytorch/common/precision_intermediate.rst
+++ b/docs/source-pytorch/common/precision_intermediate.rst
@ -169,19 +169,20 @@ Quantization via Bitsandbytes

 Both 4-bit (`paper reference <https://arxiv.org/abs/2305.14314v1>`__) and 8-bit (`paper reference <https://arxiv.org/abs/2110.02861>`__) quantization is supported.
 Specifically, we support the following modes:
-* nf4: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
-* nf4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* fp4: Uses regular float 4-bit data type.
-* fp4-dq: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
-* int8: Uses unsigned int8 data type.
-* int8-training: Meant for int8 activations with fp16 precision weights.
+
+* **nf4**: Uses the normalized float 4-bit data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
+* **nf4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **fp4**: Uses regular float 4-bit data type.
+* **fp4-dq**: "dq" stands for "Double Quantization" which reduces the average memory footprint by quantizing the quantization constants. In average, this amounts to about 0.37 bits per parameter (approximately 3 GB for a 65B model).
+* **int8**: Uses unsigned int8 data type.
+* **int8-training**: Meant for int8 activations with fp16 precision weights.

 While these techniques store weights in 4 or 8 bit, the computation still happens in 16 or 32-bit (float16, bfloat16, float32).
 This is configurable via the dtype argument in the plugin.

 Quantizing the model will dramatically reduce the weight's memory requirements but  may have a negative impact on the model's performance or runtime.

-The Trainer automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.
+The :class:`~lightning.pytorch.plugins.precision.bitsandbytes.BitsandbytesPrecisionPlugin` automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.

 .. code-block:: python