From a110bbfe1a6befc47fde11fcedee372e55e0f237 Mon Sep 17 00:00:00 2001 From: zredeaux65 <95777719+zredeaux65@users.noreply.github.com> Date: Tue, 1 Mar 2022 03:19:44 -0800 Subject: [PATCH] Spelling and grammar updates in documentation (#11861) Co-authored-by: rohitgr7 --- docs/source/guides/data.rst | 47 ++++++++-------- docs/source/guides/speed.rst | 74 +++++++++++++------------- docs/source/starter/converting.rst | 5 +- docs/source/starter/installation.rst | 6 +-- docs/source/starter/lightning_lite.rst | 10 ++-- docs/source/starter/style_guide.rst | 32 ++++++----- 6 files changed, 89 insertions(+), 85 deletions(-) diff --git a/docs/source/guides/data.rst b/docs/source/guides/data.rst index 1eda7ac629..8ddfb8fde4 100644 --- a/docs/source/guides/data.rst +++ b/docs/source/guides/data.rst @@ -32,10 +32,10 @@ There are a few different data containers used in Lightning: - A :class:`~pytorch_lightning.core.datamodule.LightningDataModule` is simply a collection of: training DataLoader(s), validation DataLoader(s), test DataLoader(s) and predict DataLoader(s), along with the matching transforms and data processing/downloads steps required. -Why use LightningDataModule? +Why Use LightningDataModule? ============================ -The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` was designed as a way of decoupling data-related hooks from the :class:`~pytorch_lightning.core.lightning.LightningModule` so you can develop dataset agnostic models. The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` makes it easy to hot swap different datasets with your model, so you can test it and benchmark it across domains. It also makes sharing and reusing the exact data splits and transforms across projects possible. +The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` was designed as a way of decoupling data-related hooks from the :class:`~pytorch_lightning.core.lightning.LightningModule` so you can develop dataset agnostic models. The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` makes it easy to hot swap different Datasets with your model, so you can test it and benchmark it across domains. It also makes sharing and reusing the exact data splits and transforms across projects possible. Read :ref:`this ` for more details on LightningDataModule. @@ -50,20 +50,18 @@ Multiple Datasets There are a few ways to pass multiple Datasets to Lightning: 1. Create a DataLoader that iterates over multiple Datasets under the hood. -2. In the training loop you can pass multiple DataLoaders as a dict or list/tuple and Lightning - will automatically combine the batches from different DataLoaders. You can control the way how dataloaders of different length - are combined by the flag `multiple_trainloader_mode` of the :class:`~pytorch_lightning.trainer.Trainer`. Alternatively, you can provide - dataloaders via :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`. -3. In the validation, test or prediction you have the option to either return multiple DataLoaders as list/tuple, which Lightning will call sequentially, - or combine the dataloaders using :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`, which Lightning will +2. In the training loop, you can pass multiple DataLoaders as a dict or list/tuple, and Lightning will + automatically combine the batches from different DataLoaders. +3. In the validation, test, or prediction, you have the option to return multiple DataLoaders as list/tuple, which Lightning will call sequentially + or combine the DataLoaders using :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`, which Lightning will automatically combine the batches from different DataLoaders. Using LightningDataModule ========================= -You can set more than one :class:`~torch.utils.data.DataLoader` in your :class:`~pytorch_lightning.core.datamodule.LightningDataModule` using its dataloader hooks -and Lightning will use the correct one under-the-hood. +You can set more than one :class:`~torch.utils.data.DataLoader` in your :class:`~pytorch_lightning.core.datamodule.LightningDataModule` using its DataLoader hooks +and Lightning will use the correct one. .. testcode:: @@ -90,9 +88,9 @@ Using LightningModule Hooks Concatenated Dataset -------------------- -For training with multiple datasets you can create a :class:`~torch.utils.data.DataLoader` class -which wraps your multiple datasets using :class:`~torch.utils.data.ConcatDataset`. This of course -also works for testing, validation and prediction datasets. +For training with multiple Datasets, you can create a :class:`~torch.utils.data.DataLoader` class +which wraps your multiple Datasets using :class:`~torch.utils.data.ConcatDataset`. This, of course, +also works for testing, validation, and prediction Datasets. .. testcode:: @@ -122,7 +120,7 @@ Return Multiple DataLoaders You can set multiple DataLoaders in your :class:`~pytorch_lightning.core.lightning.LightningModule`, and Lightning will take care of batch combination. -For more details please have a look at :paramref:`~pytorch_lightning.trainer.trainer.Trainer.multiple_trainloader_mode` +For more details, refer to :paramref:`~pytorch_lightning.trainer.trainer.Trainer.multiple_trainloader_mode` .. testcode:: @@ -212,11 +210,11 @@ Multiple Validation/Test/Predict DataLoaders For validation, test and predict DataLoaders, you can pass a single DataLoader or a list of them. This optional named parameter can be used in conjunction with any of the above use cases. You can choose to pass the batches sequentially or simultaneously, as is done for the training step. -The default mode for these DataLoaders is sequential. Note that when using a sequence of dataloaders you need +The default mode for these DataLoaders is sequential. Note that when using a sequence of DataLoaders you need to add an additional argument ``dataloader_idx`` in their corresponding step specific hook. The corresponding loop will process -the dataloaders in sequential order, i.e., the first dataloader will be processed completely, then the second one, and so on. +the DataLoaders in sequential order; that is, the first DataLoader will be processed completely, then the second one, and so on. -See the following for more details for the default sequential option: +Refer to the following for more details for the default sequential option: - :meth:`~pytorch_lightning.core.hooks.DataHooks.val_dataloader` - :meth:`~pytorch_lightning.core.hooks.DataHooks.test_dataloader` @@ -234,7 +232,7 @@ See the following for more details for the default sequential option: ... -Evaluation dataloaders are iterated over sequentially. If you want to iterate over them in parallel, PyTorch Lightning provides a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which supports collections of dataloaders such as list, tuple, or dictionary. The dataloaders can be accessed using in the same way as the provided structure: +Evaluation DataLoaders are iterated over sequentially. If you want to iterate over them in parallel, PyTorch Lightning provides a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which supports collections of DataLoaders such as list, tuple, or dictionary. The DataLoaders can be accessed using in the same way as the provided structure: .. testcode:: @@ -257,13 +255,13 @@ Evaluation dataloaders are iterated over sequentially. If you want to iterate ov Evaluate with Additional DataLoaders ==================================== -You can evaluate your models using additional dataloaders even if the dataloader specific hooks haven't been defined within your +You can evaluate your models using additional DataLoaders even if the DataLoader specific hooks haven't been defined within your :class:`~pytorch_lightning.core.lightning.LightningModule`. For example, this would be the case if your test data set is not available at the time your model was declared. Simply pass the test set to the :meth:`~pytorch_lightning.trainer.trainer.Trainer.test` method: .. code-block:: python - # setup your data loader + # setup your DataLoader test = DataLoader(...) # test (pass in the loader) @@ -291,7 +289,8 @@ In the case that you require access to the DataLoader or Dataset objects, DataLo # extract metadata, etc. from the dataset: ... -If you are using a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which allows you to fetch batches from a collection of dataloaders dataloader simultaneously which supports collections of dataloaders such as list, tuple, or dictionary. The dataloaders can be accessed using the same collection structure: +If you are using a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which allows you to fetch batches from a collection of DataLoaders +simultaneously which supports collections of DataLoader such as list, tuple, or dictionary. The DataLoaders can be accessed using the same collection structure: .. code-block:: python @@ -300,14 +299,14 @@ If you are using a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` test_dl1 = ... test_dl2 = ... - # If you provided a list of dataloaders: + # If you provided a list of DataLoaders: combined_loader = CombinedLoader([test_dl1, test_dl2]) list_of_loaders = combined_loader.loaders test_dl1 = list_of_loaders.loaders[0] - # If you provided dictionary of dataloaders: + # If you provided dictionary of DataLoaders: combined_loader = CombinedLoader({"dl1": test_dl1, "dl2": test_dl2}) dictionary_of_loaders = combined_loader.loaders @@ -327,7 +326,7 @@ Lightning has built in support for dealing with sequential data. Packed Sequences as Inputs ========================== -When using :class:`~torch.nn.utils.rnn.PackedSequence`, do 2 things: +When using :class:`~torch.nn.utils.rnn.PackedSequence`, do two things: 1. Return either a padded tensor in dataset or a list of variable length tensors in the DataLoader's `collate_fn `_ (example shows the list implementation). 2. Pack the sequence in forward or training and validation steps depending on use case. diff --git a/docs/source/guides/speed.rst b/docs/source/guides/speed.rst index e8e3207370..93ed63799a 100644 --- a/docs/source/guides/speed.rst +++ b/docs/source/guides/speed.rst @@ -8,7 +8,7 @@ ####################### -Speed up Model Training +Speed Up Model Training ####################### When you are limited with the resources, it becomes hard to speed up model training and reduce the training time @@ -26,7 +26,7 @@ With Lightning, running on GPUs, TPUs, IPUs on multiple nodes is a simple switch GPU Training ============ -Lightning supports a variety of plugins to further speed up distributed GPU training. Most notably: +Lightning supports a variety of plugins to speed up distributed GPU training. Most notably: * :class:`~pytorch_lightning.strategies.DDPStrategy` * :class:`~pytorch_lightning.strategies.DDPShardedStrategy` @@ -50,9 +50,9 @@ GPU Training Speedup Tips When training on single or multiple GPU machines, Lightning offers a host of advanced optimizations to improve throughput, memory efficiency, and model scaling. Refer to :doc:`Advanced GPU Optimized Training for more details <../advanced/advanced_gpu>`. -Prefer DDP over DP +Prefer DDP Over DP ^^^^^^^^^^^^^^^^^^ -:class:`~pytorch_lightning.strategies.dp.DataParallelStrategy` performs 3 GPU transfers for EVERY batch: +:class:`~pytorch_lightning.strategies.dp.DataParallelStrategy` performs three GPU transfers for EVERY batch: 1. Copy the model to the device. 2. Copy the data to the device. @@ -65,7 +65,7 @@ Prefer DDP over DP | -Whereas :class:`~pytorch_lightning.strategies.ddp.DDPStrategy` only performs 2 transfer operations, making DDP much faster than DP: +Whereas :class:`~pytorch_lightning.strategies.ddp.DDPStrategy` only performs two transfer operations, making DDP much faster than DP: 1. Moving data to the device. 2. Transfer and sync gradients. @@ -78,11 +78,11 @@ Whereas :class:`~pytorch_lightning.strategies.ddp.DDPStrategy` only performs 2 t | -When using DDP Plugins, set find_unused_parameters=False +When Using DDP Plugins, Set find_unused_parameters=False ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -By default, we have set ``find_unused_parameters=True`` for compatibility reasons that have been observed in the past (see the `discussion `_ for more details). -When enabled, it can result in a performance hit, and can be disabled in most cases. Read more about it `here `_. +By default, we have set ``find_unused_parameters=True`` for compatibility reasons that have been observed in the past (refer to the `discussion `_ for more details). +When enabled, it can result in a performance hit and can be disabled in most cases. Read more about it `here `_. .. tip:: It applies to all DDP strategies that support ``find_unused_parameters`` as input. @@ -105,10 +105,10 @@ When enabled, it can result in a performance hit, and can be disabled in most ca strategy=DDPSpawnStrategy(find_unused_parameters=False), ) -When using DDP on a Multi-node Cluster, set NCCL Parameters +When Using DDP on a Multi-node Cluster, Set NCCL Parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -`NCCL `__ is the NVIDIA Collective Communications Library which is used under the hood by PyTorch to handle communication across nodes and GPUs. There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue `__. In the issue we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2. +`NCCL `__ is the NVIDIA Collective Communications Library that is used by PyTorch to handle communication across nodes and GPUs. There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue `__. In the issue, we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2. NCCL parameters can be adjusted via environment variables. @@ -125,7 +125,7 @@ NCCL parameters can be adjusted via environment variables. export NCCL_NSOCKS_PERTHREAD=4 export NCCL_SOCKET_NTHREADS=2 -Dataloaders +DataLoaders ^^^^^^^^^^^ When building your DataLoader set ``num_workers>0`` and ``pin_memory=True`` (only for GPUs). @@ -140,13 +140,13 @@ num_workers The question of how many workers to specify in ``num_workers`` is tricky. Here's a summary of `some references `_, and our suggestions: 1. ``num_workers=0`` means ONLY the main process will load batches (that can be a bottleneck). -2. ``num_workers=1`` means ONLY one worker (just not the main process) will load data but it will still be slow. +2. ``num_workers=1`` means ONLY one worker (just not the main process) will load data, but it will still be slow. 3. The performance of high ``num_workers`` depends on the batch size and your machine. 4. A general place to start is to set ``num_workers`` equal to the number of CPU cores on that machine. You can get the number of CPU cores in python using ``os.cpu_count()``, but note that depending on your batch size, you may overflow RAM memory. .. warning:: Increasing ``num_workers`` will ALSO increase your CPU memory consumption. -The best thing to do is to increase the ``num_workers`` slowly and stop once you see no more improvement in your training speed. +The best thing to do is to increase the ``num_workers`` slowly and stop once there is no more improvement in your training speed. For debugging purposes or for dataloaders that load very small datasets, it is desirable to set ``num_workers=0``. However, this will always log a warning for every dataloader with ``num_workers <= min(2, os.cpu_count())``. In such cases, you can specifically filter this warning by using: @@ -156,7 +156,7 @@ For debugging purposes or for dataloaders that load very small datasets, it is d warnings.filterwarnings("ignore", ".*Consider increasing the value of the `num_workers` argument*") - # or to ignore all warnings which could be false positives + # or to ignore all warnings that could be false positives from pytorch_lightning.utilities.warnings import PossibleUserWarning warnings.filterwarnings("ignore", category=PossibleUserWarning) @@ -186,7 +186,7 @@ This is a limitation of Python ``.spawn()`` and PyTorch. TPU Training ============ -You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or 8 cores. +You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or eight cores. .. code-block:: python @@ -199,7 +199,7 @@ You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or 8 cores. # train on 8 TPU cores trainer = Trainer(tpu_cores=8) -To train on more than 8 cores (ie: a POD), +To train on more than eight cores (a POD), submit this script using the xla_dist script. Example:: @@ -239,7 +239,7 @@ less memory bandwidth and run match operations much faster on GPUs that support **Use when:** * You want to optimize for memory usage on a GPU. -* You have a GPU that supports 16 bit precision (NVIDIA pascal architecture or newer). +* You have a GPU that supports 16-bit precision (NVIDIA pascal architecture or newer). * Your optimization algorithm (training_step) is numerically stable. * You want to be the cool person in the lab :p @@ -251,7 +251,7 @@ less memory bandwidth and run match operations much faster on GPUs that support | -Mixed precision combines the use of both 32 and 16 bit floating points to reduce memory footprint during model training, resulting in improved performance, achieving +3X speedups on modern GPUs. +Mixed precision combines the use of both 32 and 16-bit floating points to reduce memory footprint during model training, resulting in improved performance, achieving upto +3X speedups on modern GPUs. Lightning offers mixed precision training for GPUs and CPUs, as well as bfloat16 mixed precision training for TPUs. @@ -286,7 +286,7 @@ Setting ``min_epochs=N`` makes sure that the training will run for at least ``N` trainer = Trainer(min_epochs=1, max_epochs=1000) -If running iteration based training, i.e. infinite / iterable dataloader, you can also control the number of steps with the ``min_steps`` and ``max_steps`` flags: +If running iteration based training, i.e., infinite / iterable DataLoader, you can also control the number of steps with the ``min_steps`` and ``max_steps`` flags: .. testcode:: @@ -313,10 +313,10 @@ Learn more in our :ref:`trainer_flags` guide. Control Validation Frequency **************************** -Check validation every n epochs +Check Validation Every n Epochs =============================== -**Use when:** You have a small dataset, and want to run less validation checks. +**Use when:** You have a small dataset and want to run fewer validation checks. You can limit validation check to only run every n epochs using the ``check_val_every_n_epoch`` Trainer flag. @@ -329,13 +329,13 @@ You can limit validation check to only run every n epochs using the ``check_val_ trainer = Trainer(check_val_every_n_epoch=7) -Validation within Training Epoch +Validation Within Training Epoch ================================ -**Use when:** You have a large training dataset, and want to run mid-epoch validation checks. +**Use when:** You have a large training dataset and want to run mid-epoch validation checks. For large datasets, it's often desirable to check validation multiple times within a training epoch. -Pass in a float to check that often within 1 training epoch. Pass in an int ``K`` to check every ``K`` training batches. +Pass in a float to check that often within one training epoch. Pass in an int ``K`` to check every ``K`` training batch. Must use an ``int`` if using an :class:`~torch.utils.data.IterableDataset`. .. testcode:: @@ -362,12 +362,12 @@ Preload Data Into RAM When your training or preprocessing requires many operations to be performed on entire dataset(s), it can sometimes be beneficial to store all data in RAM given there is enough space. However, loading all data at the beginning of the training script has the disadvantage that it can take a long -time and hence it slows down the development process. Another downside is that in multiprocessing (e.g. DDP) +time, and hence, it slows down the development process. Another downside is that in multiprocessing (e.g., DDP) the data would get copied in each process. One can overcome these problems by copying the data into RAM in advance. Most UNIX-based operating systems provide direct access to tmpfs through a mount point typically named ``/dev/shm``. -0. Increase shared memory if necessary. Refer to the documentation of your OS how to do this. +Increase shared memory if necessary. Refer to the documentation of your OS on how to do this. 1. Copy training data to shared memory: @@ -375,7 +375,7 @@ Most UNIX-based operating systems provide direct access to tmpfs through a mount cp -r /path/to/data/on/disk /dev/shm/ -2. Refer to the new data root in your script or command line arguments: +2. Refer to the new data root in your script or command-line arguments: .. code-block:: python @@ -393,8 +393,8 @@ distributed setting. Here is an explanation of what it does: * Considering the current optimizer as A and all other optimizers as B. -* Toggling means that all parameters from B exclusive to A will have their ``requires_grad`` attribute set to ``False``. -* Their original state will be restored when exiting the context manager. +* Toggling, which means all parameters from B exclusive to A will have their ``requires_grad`` attribute set to ``False``. +* Restoring their original state when exiting the context manager. When performing gradient accumulation, there is no need to perform grad synchronization during the accumulation phase. Setting ``sync_grad`` to ``False`` will block this synchronization and improve your training speed. @@ -403,11 +403,11 @@ Setting ``sync_grad`` to ``False`` will block this synchronization and improve y :meth:`~pytorch_lightning.core.optimizer.LightningOptimizer.toggle_model` function as a :func:`contextlib.contextmanager` for advanced users. -Here is an example for advanced use-case: +Here is an example of an advanced use case: .. testcode:: - # Scenario for a GAN with gradient accumulation every 2 batches and optimized for multiple gpus. + # Scenario for a GAN with gradient accumulation every two batches and optimized for multiple gpus. class SimpleGAN(LightningModule): def __init__(self): super().__init__() @@ -469,9 +469,9 @@ Here is an example for advanced use-case: Set Grads to None ***************** -In order to modestly improve performance, you can override :meth:`~pytorch_lightning.core.lightning.LightningModule.optimizer_zero_grad`. +In order to improve performance, you can override :meth:`~pytorch_lightning.core.lightning.LightningModule.optimizer_zero_grad`. -For a more detailed explanation of pros / cons of this technique, +For a more detailed explanation of the pros / cons of this technique, read the documentation for :meth:`~torch.optim.Optimizer.zero_grad` by the PyTorch team. .. testcode:: @@ -484,7 +484,7 @@ read the documentation for :meth:`~torch.optim.Optimizer.zero_grad` by the PyTor ----- *************** -Things to avoid +Things to Avoid *************** .item(), .numpy(), .cpu() @@ -496,9 +496,9 @@ takes a great deal of care to be optimized for this. Clear Cache =========== -Don't call :func:`torch.cuda.empty_cache` unnecessarily! Every time you call this ALL your GPUs have to wait to sync. +Don't call :func:`torch.cuda.empty_cache` unnecessarily! Every time you call this, ALL your GPUs have to wait to sync. -Transferring tensors to device +Transferring Tensors to Device ============================== LightningModules know what device they are on! Construct tensors on the device directly to avoid CPU->Device transfer. @@ -512,7 +512,7 @@ LightningModules know what device they are on! Construct tensors on the device d t = torch.rand(2, 2, device=self.device) -For tensors that need to be model attributes, it is best practice to register them as buffers in the modules's +For tensors that need to be model attributes, it is best practice to register them as buffers in the module's ``__init__`` method: .. code-block:: python diff --git a/docs/source/starter/converting.rst b/docs/source/starter/converting.rst index 331fe6e5e8..23a87d94bd 100644 --- a/docs/source/starter/converting.rst +++ b/docs/source/starter/converting.rst @@ -6,12 +6,11 @@ .. _converting: - ###################################### -How to organize PyTorch into Lightning +How to Organize PyTorch Into Lightning ###################################### -To enable your code to work with Lightning, here's how to organize PyTorch into Lightning: +To enable your code to work with Lightning, perform the following to organize PyTorch into Lightning. -------- diff --git a/docs/source/starter/installation.rst b/docs/source/starter/installation.rst index 886624d14e..d823ab2e07 100644 --- a/docs/source/starter/installation.rst +++ b/docs/source/starter/installation.rst @@ -56,14 +56,14 @@ You can also use `Conda Environments `_ precision: +Here is how to train on eight GPUs with `torch.bfloat16 `_ precision: .. code-block:: python Lite(strategy="ddp", devices=8, accelerator="gpu", precision="bf16").run(10) -Here is how to use `DeepSpeed Zero3 `_ with 8 GPUs and precision 16: +Here is how to use `DeepSpeed Zero3 `_ with eight GPUs and precision 16: .. code-block:: python @@ -148,7 +148,7 @@ Here is how to use `DeepSpeed Zero3 0`` in a DataLoader or you will bottleneck your code. +.. warning:: Make sure not to use ``Trainer(strategy="ddp_spawn")`` with ``num_workers>0`` in the DataLoader or you will bottleneck you code. DataModules =========== @@ -212,13 +213,18 @@ datasets with your model, so you can test it and benchmark it across domains. It Check out :ref:`data` document to understand data management within Lightning and its best practices. ------------- +* What dataset splits were used? +* How many samples does this dataset have overall and within each split? +* Which transforms were used? -******** -Examples -******** +It's for this reason that we recommend you use datamodules. This is especially important when collaborating because +it will save your team a lot of time as well. -Checkout the live examples to get your hands dirty: +All they need to do is drop a datamodule into the Trainer and not worry about what was done to the data. +This is true for both academic and corporate settings where data cleaning and ad-hoc instructions slow down the progress +of iterating through ideas. + +- Checkout the live examples to get your hands dirty: - `Introduction to PyTorch Lightning `_ - `Introduction to DataModules `_