Update cloud docs (#8569)

* amp

* amp

* docs

* add guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* amp

* amp

* docs

* add guides

* speed guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete ds.txt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update conf.py

* Update docs.txt

* remove 16 bit

* remove finetune from speed guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* remove early stopping from speed guide

* remove early stopping from speed guide

* remove early stopping from speed guide

* fix label

* fix sync

* reviews

* Update trainer.rst

* Update trainer.rst

* Update speed.rst

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* managing data

* managing data

* amp

* amp

* docs

* sync

* sync

* amp

* amp

* add data guide

* from review

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* from review

* from review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add data guide

* add data guide

* add data guide

* sync issues

* from reviw

* Update docs/source/guides/data.rst

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* add info if import fails

* fix cross referencing

* Add Datamodule motivation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* grid docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cloud_training.rst

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
This commit is contained in:
edenlightning 2021-07-27 16:22:52 +03:00 committed by GitHub
parent 4b7f78e200
commit c7e5743d54
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 31 additions and 25 deletions

View File

@ -4,39 +4,45 @@
Cloud Training
##############
Lightning has a native solution for training on AWS/GCP at scale.
Go to `grid.ai <https://www.grid.ai/>`_ to create an account.
Lightning makes it easy to scale your training, without the boilerplate.
If you want to train your models on the cloud, without dealing with engineering infrastructure and servers, you can try `Grid.ai <https://www.grid.ai/>`_.
We've designed Grid to work seamlessly with Lightning, without needing to make ANY code changes.
Developed by the creators of `PyTorch Lightning <https://www.pytorchlightning.ai/>`_, Grid is a platform that allows you to:
To use Grid, replace ``python`` in your regular command:
- **Scale your models to multi-GPU and multiple nodes** instantly with interactive sessions
- **Run Hyperparameter Sweeps on 100s of GPUs** in one command
- **Upload huge datasets** for availability at scale
- **Iterate faster and cheaper**, you only pay for what you need
****************
Training on Grid
****************
.. raw:: html
<video width="50%" max-width="400px" controls
poster="https://grid-docs.s3.us-east-2.amazonaws.com/grid.png"
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/grid.mp4"></video>
|
You can launch any Lightning model on Grid using the Grid `CLI <https://pypi.org/project/lightning-grid/>`_:
.. code-block:: bash
python my_model.py --learning_rate 1e-6 --layers 2 --gpus 4
grid run --instance_type v100 --gpus 4 my_model.py --gpus 4 --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'
To use the ``grid run`` command:
You can also start runs or interactive sessions from the `Grid platform <https://platform.grid.ai>`_, where you can upload datasets, view artifacts, view the logs, the cost, log into tensorboard, and so much more.
.. code-block:: bash
grid run --gpus 4 my_model.py --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'
**********
Learn More
**********
The above command will launch (20 * 4) experiments, each running on 4 GPUs (320 GPUs!) - by making ZERO changes to
your code.
`Sign up for Grid <http://platform.grid.ai>`_ and receive free credits to get you started!
The ``uniform`` command is part of our new expressive syntax which lets you construct hyperparameter combinations
using over 20+ distributions, lists, etc. Of course, you can also configure all of this using yamls which
can be dynamically assembled at runtime.
`Grid in 3 minutes <https://docs.grid.ai/#introduction>`_
***************
Grid Highlights
***************
* Run any public or private repository with Grid, or use an interactive session.
* Grid allocates all the machines and GPUs you need on demand, so you only pay for what you need when you need it.
* Grid handles all the other parts of developing and training at scale: artifacts, logs, metrics, etc.
* Grid works with the experiment manager of your choice, no code changes needed.
* Use Grid Datastores- high-performance, low-latency, versioned datasets.
* Attach Datastores to a Run so you don't have to keep downloading datasets
* Use Grid Sessions for fast prototyping on a cloud machine of your choice
* For more information check the `grid documentation <https://docs.grid.ai/>`_
`Grid.ai Terms of Service <https://www.grid.ai/terms-of-service/>`_