662 B
662 B
Transformers
This example contains a simple training loop for next-word prediction with a Transformer model on a subset of the WikiText2 dataset. It is a simplified version of the official PyTorch example.
Train with Fabric
# CPU
lightning run model --accelerator=cpu train.py
# GPU (CUDA or M1 Mac)
lightning run model --accelerator=gpu train.py
# Multiple GPUs
lightning run model --accelerator=gpu --devices=4 train.py