lightning/docs/source-pytorch/common/remote_fs.rst

58 lines
2.7 KiB
ReStructuredText
Raw Normal View History

2022-01-12 13:51:24 +00:00
.. _remote_fs:
2022-01-12 13:51:24 +00:00
##################
Remote Filesystems
##################
PyTorch Lightning enables working with data from a variety of filesystems, including local filesystems and several cloud storage providers such as
`S3 <https://aws.amazon.com/s3/>`_ on `AWS <https://aws.amazon.com/>`_, `GCS <https://cloud.google.com/storage>`_ on `Google Cloud <https://cloud.google.com/>`_,
or `ADL <https://azure.microsoft.com/solutions/data-lake/>`_ on `Azure <https://azure.microsoft.com/>`_.
This applies to saving and writing checkpoints, as well as for logging.
Working with different filesystems can be accomplished by appending a protocol like "s3:/" to file paths for writing and reading data.
.. code-block:: python
2022-01-12 13:51:24 +00:00
# `default_root_dir` is the default path used for logs and checkpoints
trainer = Trainer(default_root_dir="s3://my_bucket/data/")
trainer.fit(model)
You could pass custom paths to loggers for logging data.
.. code-block:: python
from pytorch_lightning.loggers import TensorBoardLogger
logger = TensorBoardLogger(save_dir="s3://my_bucket/logs/")
trainer = Trainer(logger=logger)
trainer.fit(model)
Additionally, you could also resume training with a checkpoint stored at a remote filesystem.
.. code-block:: python
trainer = Trainer(default_root_dir=tmpdir, max_steps=3)
trainer.fit(model, ckpt_path="s3://my_bucket/ckpts/classifier.ckpt")
2022-01-12 13:51:24 +00:00
PyTorch Lightning uses `fsspec <https://filesystem-spec.readthedocs.io/>`_ internally to handle all filesystem operations.
The most common filesystems supported by Lightning are:
* Local filesystem: ``file://`` - It's the default and doesn't need any protocol to be used. It's installed by default in Lightning.
* Amazon S3: ``s3://`` - Amazon S3 remote binary store, using the library `s3fs <https://s3fs.readthedocs.io/>`__. Run ``pip install fsspec[s3]`` to install it.
* Google Cloud Storage: ``gcs://`` or ``gs://`` - Google Cloud Storage, using `gcsfs <https://gcsfs.readthedocs.io/en/stable/>`__. Run ``pip install fsspec[gcs]`` to install it.
* Microsoft Azure Storage: ``adl://``, ``abfs://`` or ``az://`` - Microsoft Azure Storage, using `adlfs <https://github.com/fsspec/adlfs>`__. Run ``pip install fsspec[adl]`` to install it.
* Hadoop File System: ``hdfs://`` - Hadoop Distributed File System. This uses `PyArrow <https://arrow.apache.org/docs/python/>`__ as the backend. Run ``pip install fsspec[hdfs]`` to install it.
You could learn more about the available filesystems with:
.. code-block:: python
from fsspec.registry import known_implementations
print(known_implementations)
docs refactor 3/n (#12795) * updated titles + css * updated titles + css * levels structure * levels structure * levels structure * adding level indexes * finished intro guide layout * finished intro guide layout * general titles * general titles * added movie * added movie * finished 15 mins * levels * added core levels * added core levels * fixed api reference on the left * gpu guides * gpu guides * gpu guides * gpu guides * precision * hpu guide * added ipu * added ipu * added ipu * added ckpt docs * finished basic logging * intermediate * intermediate * intermediate * fixed * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * added logger stuff * added logger stuff * added logger stuff * added logger stuff * added logger stuff * ic * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * updated menu * added basic cloud docs * added basic cloud docs * added basic cloud docs * added basic cloud docs * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * added demos folder * added demos folder * added demos folder * added demos folder * added demos folder * added demos folder * twocolumns directive * twocols * twocols * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * updated titles + css * levels structure * adding level indexes * finished intro guide layout * general titles * added movie * finished 15 mins * levels * added core levels * fixed api reference on the left * gpu guides * precision * hpu guide * added ipu * added ckpt docs * finished basic logging * intermediate * fixed margins * added logger stuff * ic * added inconsolata * updated menu * added basic cloud docs * ic * added demos folder * twocolumns directive * registry * cleaning up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * deconflict * deconflict * deconflict * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add testsetup sections wherever needed; fix errors in building docs * pre-commit fixes * Fix duplicate label * minor nit with pre-commit * Fix labels * More changes... * require * debug & cli * prec & model & visu * fix references * fix references * fix refs * fix refs - model_parallel * fix references * prune testsetup with global * refs in index * Fix duplicate label errors * Update orphan docs * Update orphan docs * Update orphan docs * fix links * Fix genindex and search index * fix refs * fix refs * Fix index rst related issues * fix refs * inc to rst * Fix links ref * fix more references * fix refs * deconflict * errors * errors * errors * fix refs * fix refs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix warnings * Fix LightningCLI errors * Fix LightningCLI errors * Fix LightningCLI errors * Fix LightningCLI errors * fix doc build * Duplicate Label fix (docs) (#12800) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * ignore typing in demo folder * Ignore demos for mypy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: otaj <ota@grid.ai>
2022-04-19 18:15:47 +00:00
You could also look into :ref:`CheckpointIO Plugin <checkpointing_expert>` for more details on how to customize saving and loading checkpoints.