lightning/docs/source-pytorch/clouds/cluster_expert.rst

81 lines
2.1 KiB
ReStructuredText
Raw Normal View History

docs refactor 3/n (#12795) * updated titles + css * updated titles + css * levels structure * levels structure * levels structure * adding level indexes * finished intro guide layout * finished intro guide layout * general titles * general titles * added movie * added movie * finished 15 mins * levels * added core levels * added core levels * fixed api reference on the left * gpu guides * gpu guides * gpu guides * gpu guides * precision * hpu guide * added ipu * added ipu * added ipu * added ckpt docs * finished basic logging * intermediate * intermediate * intermediate * fixed * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * added logger stuff * added logger stuff * added logger stuff * added logger stuff * added logger stuff * ic * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * updated menu * added basic cloud docs * added basic cloud docs * added basic cloud docs * added basic cloud docs * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * added demos folder * added demos folder * added demos folder * added demos folder * added demos folder * added demos folder * twocolumns directive * twocols * twocols * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * updated titles + css * levels structure * adding level indexes * finished intro guide layout * general titles * added movie * finished 15 mins * levels * added core levels * fixed api reference on the left * gpu guides * precision * hpu guide * added ipu * added ckpt docs * finished basic logging * intermediate * fixed margins * added logger stuff * ic * added inconsolata * updated menu * added basic cloud docs * ic * added demos folder * twocolumns directive * registry * cleaning up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * deconflict * deconflict * deconflict * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add testsetup sections wherever needed; fix errors in building docs * pre-commit fixes * Fix duplicate label * minor nit with pre-commit * Fix labels * More changes... * require * debug & cli * prec & model & visu * fix references * fix references * fix refs * fix refs - model_parallel * fix references * prune testsetup with global * refs in index * Fix duplicate label errors * Update orphan docs * Update orphan docs * Update orphan docs * fix links * Fix genindex and search index * fix refs * fix refs * Fix index rst related issues * fix refs * inc to rst * Fix links ref * fix more references * fix refs * deconflict * errors * errors * errors * fix refs * fix refs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix warnings * Fix LightningCLI errors * Fix LightningCLI errors * Fix LightningCLI errors * Fix LightningCLI errors * fix doc build * Duplicate Label fix (docs) (#12800) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * ignore typing in demo folder * Ignore demos for mypy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: otaj <ota@grid.ai>
2022-04-19 18:15:47 +00:00
:orphan:
##################################
Run on an on-prem cluster (expert)
##################################
.. _custom-cluster:
----
**************************
Integrate your own cluster
**************************
Lightning provides an interface for providing your own definition of a cluster environment. It mainly consists of
parsing the right environment variables to access information such as world size, global and local rank (process id),
and node rank (node id). Here is an example of a custom
:class:`~pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment`:
.. code-block:: python
import os
from pytorch_lightning.plugins.environments import ClusterEnvironment
class MyClusterEnvironment(ClusterEnvironment):
@property
def creates_processes_externally(self) -> bool:
"""Return True if the cluster is managed (you don't launch processes yourself)"""
return True
def world_size(self) -> int:
return int(os.environ["WORLD_SIZE"])
def global_rank(self) -> int:
return int(os.environ["RANK"])
def local_rank(self) -> int:
return int(os.environ["LOCAL_RANK"])
def node_rank(self) -> int:
return int(os.environ["NODE_RANK"])
def main_address(self) -> str:
return os.environ["MASTER_ADDRESS"]
def main_port(self) -> int:
return int(os.environ["MASTER_PORT"])
trainer = Trainer(plugins=[MyClusterEnvironment()])
----
********
Get help
********
Setting up a cluster for distributed training is not trivial. Lightning offers lightning-grid which allows you to configure a cluster easily and run experiments via the CLI and web UI.
Try it out for free today:
.. raw:: html
<div class="display-card-container">
<div class="row">
.. Add callout items below this line
.. displayitem::
:header: Train models on the cloud
:description: Learn to run a model in the background on a cloud machine.
:col_css: col-md-6
:button_link: cloud_training.html
:height: 150
:tag: intermediate
.. raw:: html
</div>
</div