lightning/examples/fabric/tensor_parallel
Andrei-Aksionov 6b88ddc6d0
Code-style changes via pre-commit (#20483)
Pre-commit
2024-12-09 17:00:57 +01:00
..
README.md Add Studio badge to tensor parallel docs (#19913) 2024-05-28 09:04:55 -04:00
data.py (1/n) Support 2D Parallelism (#19846) 2024-05-07 17:02:58 -04:00
model.py bump python 3.9+ (#20413) 2024-11-25 09:20:17 +01:00
parallelism.py Enable loss-parallel in example (#19882) 2024-05-20 13:19:38 +02:00
train.py Code-style changes via pre-commit (#20483) 2024-12-09 17:00:57 +01:00

README.md

Tensor Parallel and 2D Parallel

This example shows how to apply tensor-parallelism to your model (here Llama 3 7B) with the ModelParallelStrategy, and how it can be combined with FSDP (2D parallelism). PyTorch 2.3+ and a machine with at least 4 GPUs and 24 GB memory each are required to run this example.

pip install 'torch>=2.3'

Navigate to this example folder and run the training script:

cd examples/fabric/tensor_parallel
python train.py

You should see an output like this:

Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 4 processes
----------------------------------------------------------------------------------------------------

Number of model parameters: 6.7 B
Starting training ...
Iteration 0 complete
Iteration 1 complete
Iteration 2 complete
Iteration 3 complete
Iteration 4 complete
Iteration 5 complete
Iteration 6 complete
Iteration 7 complete
Saving a (distributed) checkpoint ...
Training successfully completed!
Peak memory usage: 17.95 GB
!NOTE

The ModelParallelStrategy is experimental and subject to change. Report issues on GitHub.