lightning/docs/source-pytorch/strategies/hivemind.rst

45 lines
1.3 KiB
ReStructuredText

.. _hivemind:
#####################################################
Training on unreliable mixed GPUs across the internet
#####################################################
**Audience:** Users who do not have access to top-tier multi-gpu/multi-node servers and want to scale training across different GPU types, or across the internet.
----
.. raw:: html
<div class="display-card-container">
<div class="row">
.. Add callout items below this line
.. displayitem::
:header: 1: Training across multiple machines over the internet
:description: Quick setup to start training on multiple machines.
:col_css: col-md-4
:button_link: hivemind_basic.html
:height: 200
:tag: basic
.. displayitem::
:header: 2: Speed up training by enabling under-the-hood optimizations
:description: Learn which flags to use with the HivemindStrategy to speed up training.
:col_css: col-md-4
:button_link: hivemind_intermediate.html
:height: 200
:tag: intermediate
.. displayitem::
:header: 3: Optimize Memory and Communication using compression hooks
:description: Enable gradient buffer optimizations and communication improvements to reduce bottlenecks in communication.
:col_css: col-md-4
:button_link: hivemind_expert.html
:height: 200
:tag: expert
.. raw:: html
</div>
</div>