diff --git a/docs/source/tpu.rst b/docs/source/tpu.rst index 5f4c48076d..549a3a1cd2 100644 --- a/docs/source/tpu.rst +++ b/docs/source/tpu.rst @@ -40,7 +40,7 @@ To access TPUs, there are three main ways. ---------------- Colab TPUs ------------ +---------- Colab is like a jupyter notebook with a free GPU or TPU hosted on GCP. @@ -129,8 +129,7 @@ That's it! Your model will train on all 8 TPU cores. ---------------- TPU core training - ------------------------- +----------------- Lightning supports training on a single TPU core or 8 TPU cores. @@ -177,7 +176,7 @@ on how to set up the instance groups and VMs needed to run TPU Pods. ---------------- 16 bit precision ------------------ +---------------- Lightning also supports training in 16-bit precision with TPUs. By default, TPU training will use 32-bit precision. To enable 16-bit, set the 16-bit flag. @@ -194,6 +193,28 @@ Under the hood the xla library will use the `bfloat16 type `_ +- XLA Graph compilation during the initial steps `Reference `_ +- Some tensor ops are not fully supported on TPU, or not supported at all. These operations will be performed on CPU (context switch). +- PyTorch integration is still experimental. Some performance bottlenecks may simply be the result of unfinished implementation. + +The official PyTorch XLA `performance guide `_ +has more detailed information on how PyTorch code can be optimized for TPU. In particular, the +`metrics report `_ allows +one to identify operations that lead to context switching. + + About XLA ---------- XLA is the library that interfaces PyTorch with the TPUs.