The multi-GPU capabilities in Jupyter are enabled by launching processes using the 'fork' start method.
It is the only supported way of multi-processing in notebooks, but also brings some limitations that you should be aware of.
Avoid initializing CUDA before launch
=====================================
Don't run torch CUDA functions before calling ``fabric.launch(train)`` in any of the notebook cells beforehand, otherwise your code may hang or crash.
..code-block:: python
# BAD: Don't run CUDA-related code before `.launch()`
# x = torch.tensor(1).cuda()
# torch.cuda.empty_cache()
# torch.cuda.is_available()
def train(fabric):
# GOOD: Move CUDA calls into the training function
x = torch.tensor(1).cuda()
torch.cuda.empty_cache()
torch.cuda.is_available()
...
fabric = Fabric(accelerator="cuda", devices=2)
fabric.launch(train)
Move data loading code inside the function
==========================================
If you define/load your data in the main process before calling ``fabric.launch(train)``, you may see a slowdown or crashes (segmentation fault, SIGSEV, etc.).
The best practice is to move your data loading code inside the training function to avoid these issues: