diff --git a/docs/source-fabric/advanced/compile.rst b/docs/source-fabric/advanced/compile.rst index a36384ccc9..a8e1cc2db2 100644 --- a/docs/source-fabric/advanced/compile.rst +++ b/docs/source-fabric/advanced/compile.rst @@ -3,7 +3,7 @@ Speed up models by compiling them ################################# Compiling your PyTorch model can result in significant speedups, especially on the latest generations of GPUs. -This guide shows you how to apply ``torch.compile`` correctly in your code. +This guide shows you how to apply `torch.compile `_ correctly in your code. .. note:: @@ -223,6 +223,9 @@ On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically a Numbers produced with NVIDIA A100 SXM4 40GB, PyTorch 2.2.0, CUDA 12.1. +If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch `_ for further investigation. + + ---- @@ -301,4 +304,18 @@ However, should you have issues compiling DDP and FSDP models, you can opt out o model = fabric.setup(model, _reapply_compile=False) +---- + + +******************** +Additional Resources +******************** + +Here are a few resources for further reading after you complete this tutorial: + +- `PyTorch 2.0 Paper `_ +- `GenAI with PyTorch 2.0 blog post series `_ +- `Training Production AI Models with PyTorch 2.0 `_ +- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach `_ + | diff --git a/docs/source-pytorch/advanced/compile.rst b/docs/source-pytorch/advanced/compile.rst index 6da769ee40..73d5f4fbc2 100644 --- a/docs/source-pytorch/advanced/compile.rst +++ b/docs/source-pytorch/advanced/compile.rst @@ -3,7 +3,7 @@ Speed up models by compiling them ################################# Compiling your LightningModule can result in significant speedups, especially on the latest generations of GPUs. -This guide shows you how to apply ``torch.compile`` correctly in your code. +This guide shows you how to apply `torch.compile `_ correctly in your code. .. note:: @@ -192,6 +192,8 @@ However, when this is not possible, you can request PyTorch to compile the code A model compiled with ``dynamic=True`` will typically be slower than a model compiled with static shapes, but it will avoid the extreme cost of recompilation every iteration. On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically and you should no longer need to set this. +If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch `_ for further investigation. + ---- @@ -251,9 +253,9 @@ Always compare the speed and memory usage of the compiled model against the orig Limitations *********** -There are a few limitations you should be aware of when using ``torch.compile`` in conjunction with the Trainer: +There are a few limitations you should be aware of when using ``torch.compile`` **in conjunction with the Trainer**: -* ``torch.compile`` currently does not get reapplied over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment. +* The Trainer currently does not reapply ``torch.compile`` over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment. This limitation will be lifted in the future. * In some cases, using ``self.log()`` in your LightningModule will cause compilation errors. @@ -270,4 +272,19 @@ There are a few limitations you should be aware of when using ``torch.compile`` self.model = torch.compile(self.model) ... + +---- + + +******************** +Additional Resources +******************** + +Here are a few resources for further reading after you complete this tutorial: + +- `PyTorch 2.0 Paper `_ +- `GenAI with PyTorch 2.0 blog post series `_ +- `Training Production AI Models with PyTorch 2.0 `_ +- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach `_ + |