From b79b68481eb4958f15c93ded08cce91b48bea726 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adrian=20W=C3=A4lchli?= Date: Sun, 26 Nov 2023 11:04:36 +0100 Subject: [PATCH] Clarify setup of optimizer when using `empty_init=True` (#19067) --- docs/source-fabric/advanced/model_init.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source-fabric/advanced/model_init.rst b/docs/source-fabric/advanced/model_init.rst index e3098083ed..f1e5cf846b 100644 --- a/docs/source-fabric/advanced/model_init.rst +++ b/docs/source-fabric/advanced/model_init.rst @@ -75,6 +75,10 @@ When training sharded models with :doc:`FSDP ` or DeepSpeed model = fabric.setup(model) # parameters get sharded and initialized at once + # Make sure to create the optimizer only after the model has been set up + optimizer = torch.optim.Adam(model.parameters()) + optimizer = fabric.setup_optimizers(optimizer) + .. note:: Empty-init is experimental and the behavior may change in the future. For FSDP on PyTorch 2.1+, it is required that all user-defined modules that manage parameters implement a ``reset_parameters()`` method (all PyTorch built-in modules have this too).