45 lines
1.3 KiB
ReStructuredText
45 lines
1.3 KiB
ReStructuredText
.. _hivemind:
|
|
|
|
#####################################################
|
|
Training on unreliable mixed GPUs across the internet
|
|
#####################################################
|
|
**Audience:** Users who do not have access to top-tier multi-gpu/multi-node servers and want to scale training across different GPU types, or across the internet.
|
|
|
|
----
|
|
|
|
.. raw:: html
|
|
|
|
<div class="display-card-container">
|
|
<div class="row">
|
|
|
|
.. Add callout items below this line
|
|
.. displayitem::
|
|
:header: 1: Training across multiple machines over the internet
|
|
:description: Quick setup to start training on multiple machines.
|
|
:col_css: col-md-4
|
|
:button_link: hivemind_basic.html
|
|
:height: 200
|
|
:tag: basic
|
|
|
|
.. displayitem::
|
|
:header: 2: Speed up training by enabling under-the-hood optimizations
|
|
:description: Learn which flags to use with the HivemindStrategy to speed up training.
|
|
:col_css: col-md-4
|
|
:button_link: hivemind_intermediate.html
|
|
:height: 200
|
|
:tag: intermediate
|
|
|
|
.. displayitem::
|
|
:header: 3: Optimize Memory and Communication using compression hooks
|
|
:description: Enable gradient buffer optimizations and communication improvements to reduce bottlenecks in communication.
|
|
:col_css: col-md-4
|
|
:button_link: hivemind_expert.html
|
|
:height: 200
|
|
:tag: expert
|
|
|
|
|
|
.. raw:: html
|
|
|
|
</div>
|
|
</div>
|