Update Fabric introduction (#16672)

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
This commit is contained in:
Adrian Wälchli 2023-02-09 19:06:29 +01:00 committed by GitHub
parent 8f8bae9ac8
commit 83296cc6cf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 33 additions and 11 deletions

View File

@ -2,19 +2,15 @@
Fabric (Beta)
#############
Fabric allows you to scale any PyTorch model with just a few lines of code!
With Fabric, you can easily scale your model to run on distributed devices using the strategy of your choice while keeping complete control over the training loop and optimization logic.
Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code.
With only a few changes to your code, Fabric allows you to:
- Automatic placement of models and data onto the device
- Automatic support for mixed precision (speedup and smaller memory footprint)
- Seamless switching between hardware (CPU, GPU, TPU)
- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed)
- Easy-to-use launch command for spawning processes (DDP, torchelastic, etc)
- Multi-node support (TorchElastic, SLURM, and more)
- You keep complete control of your training loop
- Easily switch from running on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training
- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the box
- Handles all the boilerplate device logic for you
- Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...)
- Designed with multi-billion parameter models in mind
|
.. code-block:: diff
@ -60,6 +56,32 @@ With only a few changes to your code, Fabric allows you to:
----
***********
Why Fabric?
***********
Fabric differentiates itself from a fully-fledged trainer like :doc:`Lightning Trainer <../common/trainer>` in these key aspects:
**Fast to implement**
There is no need to restructure your code: Just change a few lines in the PyTorch script and you'll be able to leverage Fabric features.
**Maximum Flexibility**
Write your own training and/or inference logic down to the individual optimizer calls.
You aren't forced to conform to a standardized epoch-based training loop like the one in :doc:`Lightning Trainer <../common/trainer>`.
You can do flexible iteration based training, meta-learning, cross-validation and other types of optimization algorithms without digging into framework internals.
This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors.
Just remember: With great power comes a great responsibility.
**Maximum Control**
The :doc:`Lightning Trainer <../common/trainer>` has many built in features to make research simpler with less boilerplate, but debugging it requires some familiarity with the framework internals.
In Fabric, everything is opt-in. Think of it as a toolbox: You take out the tools (Fabric functions) you need and leave the other ones behind.
This makes it easier to develop and debug your PyTorch code as you gradually add more features to it.
Fabric provides important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but leaves the design and orchestration fully up to you.
----
************
Fundamentals
************