Spelling and grammar updates in documentation (#11861)

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-03-01 03:19:44 -08:00 · 2022-03-01 03:19:44 -08:00 · a110bbfe1a
parent 33fc3f1e11
commit a110bbfe1a
6 changed files with 89 additions and 85 deletions
--- a/docs/source/guides/data.rst
+++ b/docs/source/guides/data.rst
@ -32,10 +32,10 @@ There are a few different data containers used in Lightning:
     -  A :class:`~pytorch_lightning.core.datamodule.LightningDataModule` is simply a collection of: training DataLoader(s), validation DataLoader(s), test DataLoader(s) and predict DataLoader(s), along with the matching transforms and data processing/downloads steps required.


-Why use LightningDataModule?
+Why Use LightningDataModule?
 ============================

-The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` was designed as a way of decoupling data-related hooks from the :class:`~pytorch_lightning.core.lightning.LightningModule` so you can develop dataset agnostic models. The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` makes it easy to hot swap different datasets with your model, so you can test it and benchmark it across domains. It also makes sharing and reusing the exact data splits and transforms across projects possible.
+The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` was designed as a way of decoupling data-related hooks from the :class:`~pytorch_lightning.core.lightning.LightningModule` so you can develop dataset agnostic models. The :class:`~pytorch_lightning.core.datamodule.LightningDataModule` makes it easy to hot swap different Datasets with your model, so you can test it and benchmark it across domains. It also makes sharing and reusing the exact data splits and transforms across projects possible.

 Read :ref:`this <datamodules>` for more details on LightningDataModule.

@ -50,20 +50,18 @@ Multiple Datasets
 There are a few ways to pass multiple Datasets to Lightning:

 1. Create a DataLoader that iterates over multiple Datasets under the hood.
-2. In the training loop you can pass multiple DataLoaders as a dict or list/tuple and Lightning
-   will automatically combine the batches from different DataLoaders. You can control the way how dataloaders of different length
-   are combined by the flag `multiple_trainloader_mode` of the :class:`~pytorch_lightning.trainer.Trainer`. Alternatively, you can provide
-   dataloaders via :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`.
-3. In the validation, test or prediction you have the option to either return multiple DataLoaders as list/tuple, which Lightning will call sequentially,
-   or combine the dataloaders using :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`, which Lightning will
+2. In the training loop, you can pass multiple DataLoaders as a dict or list/tuple, and Lightning will
+   automatically combine the batches from different DataLoaders.
+3. In the validation, test, or prediction, you have the option to return multiple DataLoaders as list/tuple, which Lightning will call sequentially
+   or combine the DataLoaders using :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`, which Lightning will
   automatically combine the batches from different DataLoaders.


 Using LightningDataModule
 =========================

-You can set more than one :class:`~torch.utils.data.DataLoader` in your :class:`~pytorch_lightning.core.datamodule.LightningDataModule` using its dataloader hooks
-and Lightning will use the correct one under-the-hood.
+You can set more than one :class:`~torch.utils.data.DataLoader` in your :class:`~pytorch_lightning.core.datamodule.LightningDataModule` using its DataLoader hooks
+and Lightning will use the correct one.

 .. testcode::

@ -90,9 +88,9 @@ Using LightningModule Hooks
 Concatenated Dataset
 --------------------

-For training with multiple datasets you can create a :class:`~torch.utils.data.DataLoader` class
-which wraps your multiple datasets using :class:`~torch.utils.data.ConcatDataset`. This of course
-also works for testing, validation and prediction datasets.
+For training with multiple Datasets, you can create a :class:`~torch.utils.data.DataLoader` class
+which wraps your multiple Datasets using :class:`~torch.utils.data.ConcatDataset`. This, of course,
+also works for testing, validation, and prediction Datasets.

 .. testcode::

@ -122,7 +120,7 @@ Return Multiple DataLoaders

 You can set multiple DataLoaders in your :class:`~pytorch_lightning.core.lightning.LightningModule`, and Lightning will take care of batch combination.

-For more details please have a look at :paramref:`~pytorch_lightning.trainer.trainer.Trainer.multiple_trainloader_mode`
+For more details, refer to :paramref:`~pytorch_lightning.trainer.trainer.Trainer.multiple_trainloader_mode`

 .. testcode::

@ -212,11 +210,11 @@ Multiple Validation/Test/Predict DataLoaders
 For validation, test and predict DataLoaders, you can pass a single DataLoader or a list of them. This optional named
 parameter can be used in conjunction with any of the above use cases. You can choose to pass
 the batches sequentially or simultaneously, as is done for the training step.
-The default mode for these DataLoaders is sequential. Note that when using a sequence of dataloaders you need
+The default mode for these DataLoaders is sequential. Note that when using a sequence of DataLoaders you need
 to add an additional argument ``dataloader_idx`` in their corresponding step specific hook. The corresponding loop will process
-the dataloaders in sequential order, i.e., the first dataloader will be processed completely, then the second one, and so on.
+the DataLoaders in sequential order; that is, the first DataLoader will be processed completely, then the second one, and so on.

-See the following for more details for the default sequential option:
+Refer to the following for more details for the default sequential option:

 - :meth:`~pytorch_lightning.core.hooks.DataHooks.val_dataloader`
 - :meth:`~pytorch_lightning.core.hooks.DataHooks.test_dataloader`
@ -234,7 +232,7 @@ See the following for more details for the default sequential option:
        ...


-Evaluation dataloaders are iterated over sequentially. If you want to iterate over them in parallel, PyTorch Lightning provides a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which supports collections of dataloaders such as list, tuple, or dictionary. The dataloaders can be accessed using in the same way as the provided structure:
+Evaluation DataLoaders are iterated over sequentially. If you want to iterate over them in parallel, PyTorch Lightning provides a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which supports collections of DataLoaders such as list, tuple, or dictionary. The DataLoaders can be accessed using in the same way as the provided structure:

 .. testcode::

@ -257,13 +255,13 @@ Evaluation dataloaders are iterated over sequentially. If you want to iterate ov
 Evaluate with Additional DataLoaders
 ====================================

-You can evaluate your models using additional dataloaders even if the dataloader specific hooks haven't been defined within your
+You can evaluate your models using additional DataLoaders even if the DataLoader specific hooks haven't been defined within your
 :class:`~pytorch_lightning.core.lightning.LightningModule`. For example, this would be the case if your test data
 set is not available at the time your model was declared. Simply pass the test set to the :meth:`~pytorch_lightning.trainer.trainer.Trainer.test` method:

 .. code-block:: python

-    # setup your data loader
+    # setup your DataLoader
    test = DataLoader(...)

    # test (pass in the loader)
@ -291,7 +289,8 @@ In the case that you require access to the DataLoader or Dataset objects, DataLo
            # extract metadata, etc. from the dataset:
            ...

-If you are using a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which allows you to fetch batches from a collection of dataloaders dataloader simultaneously which supports collections of dataloaders such as list, tuple, or dictionary. The dataloaders can be accessed using the same collection structure:
+If you are using a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader` object which allows you to fetch batches from a collection of DataLoaders
+simultaneously which supports collections of DataLoader such as list, tuple, or dictionary. The DataLoaders can be accessed using the same collection structure:

 .. code-block:: python

@ -300,14 +299,14 @@ If you are using a :class:`~pytorch_lightning.trainer.supporters.CombinedLoader`
    test_dl1 = ...
    test_dl2 = ...

-    # If you provided a list of dataloaders:
+    # If you provided a list of DataLoaders:

    combined_loader = CombinedLoader([test_dl1, test_dl2])
    list_of_loaders = combined_loader.loaders
    test_dl1 = list_of_loaders.loaders[0]


-    # If you provided dictionary of dataloaders:
+    # If you provided dictionary of DataLoaders:

    combined_loader = CombinedLoader({"dl1": test_dl1, "dl2": test_dl2})
    dictionary_of_loaders = combined_loader.loaders
@ -327,7 +326,7 @@ Lightning has built in support for dealing with sequential data.
 Packed Sequences as Inputs
 ==========================

-When using :class:`~torch.nn.utils.rnn.PackedSequence`, do 2 things:
+When using :class:`~torch.nn.utils.rnn.PackedSequence`, do two things:

 1. Return either a padded tensor in dataset or a list of variable length tensors in the DataLoader's `collate_fn <https://pytorch.org/docs/stable/data.html#dataloader-collate-fn>`_ (example shows the list implementation).
 2. Pack the sequence in forward or training and validation steps depending on use case.
--- a/docs/source/guides/speed.rst
+++ b/docs/source/guides/speed.rst
@ -8,7 +8,7 @@


 #######################
-Speed up Model Training
+Speed Up Model Training
 #######################

 When you are limited with the resources, it becomes hard to speed up model training and reduce the training time
@ -26,7 +26,7 @@ With Lightning, running on GPUs, TPUs, IPUs on multiple nodes is a simple switch
 GPU Training
 ============

-Lightning supports a variety of plugins to further speed up distributed GPU training. Most notably:
+Lightning supports a variety of plugins to speed up distributed GPU training. Most notably:

 * :class:`~pytorch_lightning.strategies.DDPStrategy`
 * :class:`~pytorch_lightning.strategies.DDPShardedStrategy`
@ -50,9 +50,9 @@ GPU Training Speedup Tips
 When training on single or multiple GPU machines, Lightning offers a host of advanced optimizations to improve throughput, memory efficiency, and model scaling.
 Refer to :doc:`Advanced GPU Optimized Training for more details <../advanced/advanced_gpu>`.

-Prefer DDP over DP
+Prefer DDP Over DP
 ^^^^^^^^^^^^^^^^^^
-:class:`~pytorch_lightning.strategies.dp.DataParallelStrategy` performs 3 GPU transfers for EVERY batch:
+:class:`~pytorch_lightning.strategies.dp.DataParallelStrategy` performs three GPU transfers for EVERY batch:

 1. Copy the model to the device.
 2. Copy the data to the device.
@ -65,7 +65,7 @@ Prefer DDP over DP

 |

-Whereas :class:`~pytorch_lightning.strategies.ddp.DDPStrategy` only performs 2 transfer operations, making DDP much faster than DP:
+Whereas :class:`~pytorch_lightning.strategies.ddp.DDPStrategy` only performs two transfer operations, making DDP much faster than DP:

 1. Moving data to the device.
 2. Transfer and sync gradients.
@ -78,11 +78,11 @@ Whereas :class:`~pytorch_lightning.strategies.ddp.DDPStrategy` only performs 2 t
 |


-When using DDP Plugins, set find_unused_parameters=False
+When Using DDP Plugins, Set find_unused_parameters=False
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-By default, we have set ``find_unused_parameters=True`` for compatibility reasons that have been observed in the past (see the `discussion <https://github.com/PyTorchLightning/pytorch-lightning/discussions/6219>`_ for more details).
-When enabled, it can result in a performance hit, and can be disabled in most cases. Read more about it `here <https://pytorch.org/docs/stable/notes/ddp.html#internal-design>`_.
+By default, we have set ``find_unused_parameters=True`` for compatibility reasons that have been observed in the past (refer to the `discussion <https://github.com/PyTorchLightning/pytorch-lightning/discussions/6219>`_ for more details).
+When enabled, it can result in a performance hit and can be disabled in most cases. Read more about it `here <https://pytorch.org/docs/stable/notes/ddp.html#internal-design>`_.

 .. tip::
    It applies to all DDP strategies that support ``find_unused_parameters`` as input.
@ -105,10 +105,10 @@ When enabled, it can result in a performance hit, and can be disabled in most ca
        strategy=DDPSpawnStrategy(find_unused_parameters=False),
    )

-When using DDP on a Multi-node Cluster, set NCCL Parameters
+When Using DDP on a Multi-node Cluster, Set NCCL Parameters
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-`NCCL <https://developer.nvidia.com/nccl>`__ is the NVIDIA Collective Communications Library which is used under the hood by PyTorch to handle communication across nodes and GPUs. There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/PyTorchLightning/pytorch-lightning/issues/7179>`__. In the issue we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2.
+`NCCL <https://developer.nvidia.com/nccl>`__ is the NVIDIA Collective Communications Library that is used by PyTorch to handle communication across nodes and GPUs. There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/PyTorchLightning/pytorch-lightning/issues/7179>`__. In the issue, we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2.

 NCCL parameters can be adjusted via environment variables.

@ -125,7 +125,7 @@ NCCL parameters can be adjusted via environment variables.
    export NCCL_NSOCKS_PERTHREAD=4
    export NCCL_SOCKET_NTHREADS=2

-Dataloaders
+DataLoaders
 ^^^^^^^^^^^

 When building your DataLoader set ``num_workers>0`` and ``pin_memory=True`` (only for GPUs).
@ -140,13 +140,13 @@ num_workers
 The question of how many workers to specify in ``num_workers`` is tricky. Here's a summary of `some references <https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813>`_, and our suggestions:

 1. ``num_workers=0`` means ONLY the main process will load batches (that can be a bottleneck).
-2. ``num_workers=1`` means ONLY one worker (just not the main process) will load data but it will still be slow.
+2. ``num_workers=1`` means ONLY one worker (just not the main process) will load data, but it will still be slow.
 3. The performance of high ``num_workers`` depends on the batch size and your machine.
 4. A general place to start is to set ``num_workers`` equal to the number of CPU cores on that machine. You can get the number of CPU cores in python using ``os.cpu_count()``, but note that depending on your batch size, you may overflow RAM memory.

 .. warning:: Increasing ``num_workers`` will ALSO increase your CPU memory consumption.

-The best thing to do is to increase the ``num_workers`` slowly and stop once you see no more improvement in your training speed.
+The best thing to do is to increase the ``num_workers`` slowly and stop once there is no more improvement in your training speed.

 For debugging purposes or for dataloaders that load very small datasets, it is desirable to set ``num_workers=0``. However, this will always log a warning for every dataloader with ``num_workers <= min(2, os.cpu_count())``. In such cases, you can specifically filter this warning by using:

@ -156,7 +156,7 @@ For debugging purposes or for dataloaders that load very small datasets, it is d

    warnings.filterwarnings("ignore", ".*Consider increasing the value of the `num_workers` argument*")

-    # or to ignore all warnings which could be false positives
+    # or to ignore all warnings that could be false positives
    from pytorch_lightning.utilities.warnings import PossibleUserWarning

    warnings.filterwarnings("ignore", category=PossibleUserWarning)
@ -186,7 +186,7 @@ This is a limitation of Python ``.spawn()`` and PyTorch.
 TPU Training
 ============

-You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or 8 cores.
+You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or eight cores.

 .. code-block:: python

@ -199,7 +199,7 @@ You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or 8 cores.
    # train on 8 TPU cores
    trainer = Trainer(tpu_cores=8)

-To train on more than 8 cores (ie: a POD),
+To train on more than eight cores (a POD),
 submit this script using the xla_dist script.

 Example::
@ -239,7 +239,7 @@ less memory bandwidth and run match operations much faster on GPUs that support
 **Use when:**

 * You want to optimize for memory usage on a GPU.
-* You have a GPU that supports 16 bit precision (NVIDIA pascal architecture or newer).
+* You have a GPU that supports 16-bit precision (NVIDIA pascal architecture or newer).
 * Your optimization algorithm (training_step) is numerically stable.
 * You want to be the cool person in the lab :p

@ -251,7 +251,7 @@ less memory bandwidth and run match operations much faster on GPUs that support

 |

-Mixed precision combines the use of both 32 and 16 bit floating points to reduce memory footprint during model training, resulting in improved performance, achieving +3X speedups on modern GPUs.
+Mixed precision combines the use of both 32 and 16-bit floating points to reduce memory footprint during model training, resulting in improved performance, achieving upto +3X speedups on modern GPUs.

 Lightning offers mixed precision training for GPUs and CPUs, as well as bfloat16 mixed precision training for TPUs.

@ -286,7 +286,7 @@ Setting ``min_epochs=N`` makes sure that the training will run for at least ``N`
    trainer = Trainer(min_epochs=1, max_epochs=1000)


-If running iteration based training, i.e. infinite / iterable dataloader, you can also control the number of steps with the ``min_steps`` and  ``max_steps`` flags:
+If running iteration based training, i.e., infinite / iterable DataLoader, you can also control the number of steps with the ``min_steps`` and  ``max_steps`` flags:

 .. testcode::

@ -313,10 +313,10 @@ Learn more in our :ref:`trainer_flags` guide.
 Control Validation Frequency
 ****************************

-Check validation every n epochs
+Check Validation Every n Epochs
 ===============================

-**Use when:** You have a small dataset, and want to run less validation checks.
+**Use when:** You have a small dataset and want to run fewer validation checks.

 You can limit validation check to only run every n epochs using the ``check_val_every_n_epoch`` Trainer flag.

@ -329,13 +329,13 @@ You can limit validation check to only run every n epochs using the ``check_val_
    trainer = Trainer(check_val_every_n_epoch=7)


-Validation within Training Epoch
+Validation Within Training Epoch
 ================================

-**Use when:** You have a large training dataset, and want to run mid-epoch validation checks.
+**Use when:** You have a large training dataset and want to run mid-epoch validation checks.

 For large datasets, it's often desirable to check validation multiple times within a training epoch.
-Pass in a float to check that often within 1 training epoch. Pass in an int ``K`` to check every ``K`` training batches.
+Pass in a float to check that often within one training epoch. Pass in an int ``K`` to check every ``K`` training batch.
 Must use an ``int`` if using an :class:`~torch.utils.data.IterableDataset`.

 .. testcode::
@ -362,12 +362,12 @@ Preload Data Into RAM
 When your training or preprocessing requires many operations to be performed on entire dataset(s), it can
 sometimes be beneficial to store all data in RAM given there is enough space.
 However, loading all data at the beginning of the training script has the disadvantage that it can take a long
-time and hence it slows down the development process. Another downside is that in multiprocessing (e.g. DDP)
+time, and hence, it slows down the development process. Another downside is that in multiprocessing (e.g., DDP)
 the data would get copied in each process.
 One can overcome these problems by copying the data into RAM in advance.
 Most UNIX-based operating systems provide direct access to tmpfs through a mount point typically named ``/dev/shm``.

-0.  Increase shared memory if necessary. Refer to the documentation of your OS how to do this.
+Increase shared memory if necessary. Refer to the documentation of your OS on how to do this.

 1.  Copy training data to shared memory:

@ -375,7 +375,7 @@ Most UNIX-based operating systems provide direct access to tmpfs through a mount

        cp -r /path/to/data/on/disk /dev/shm/

-2.  Refer to the new data root in your script or command line arguments:
+2.  Refer to the new data root in your script or command-line arguments:

    .. code-block:: python

@ -393,8 +393,8 @@ distributed setting.
 Here is an explanation of what it does:

 * Considering the current optimizer as A and all other optimizers as B.
-* Toggling means that all parameters from B exclusive to A will have their ``requires_grad`` attribute set to ``False``.
-* Their original state will be restored when exiting the context manager.
+* Toggling, which means all parameters from B exclusive to A will have their ``requires_grad`` attribute set to ``False``.
+* Restoring their original state when exiting the context manager.

 When performing gradient accumulation, there is no need to perform grad synchronization during the accumulation phase.
 Setting ``sync_grad`` to ``False`` will block this synchronization and improve your training speed.
@ -403,11 +403,11 @@ Setting ``sync_grad`` to ``False`` will block this synchronization and improve y
 :meth:`~pytorch_lightning.core.optimizer.LightningOptimizer.toggle_model` function as a
 :func:`contextlib.contextmanager` for advanced users.

-Here is an example for advanced use-case:
+Here is an example of an advanced use case:

 .. testcode::

-    # Scenario for a GAN with gradient accumulation every 2 batches and optimized for multiple gpus.
+    # Scenario for a GAN with gradient accumulation every two batches and optimized for multiple gpus.
    class SimpleGAN(LightningModule):
        def __init__(self):
            super().__init__()
@ -469,9 +469,9 @@ Here is an example for advanced use-case:
 Set Grads to None
 *****************

-In order to modestly improve performance, you can override :meth:`~pytorch_lightning.core.lightning.LightningModule.optimizer_zero_grad`.
+In order to improve performance, you can override :meth:`~pytorch_lightning.core.lightning.LightningModule.optimizer_zero_grad`.

-For a more detailed explanation of pros / cons of this technique,
+For a more detailed explanation of the pros / cons of this technique,
 read the documentation for :meth:`~torch.optim.Optimizer.zero_grad` by the PyTorch team.

 .. testcode::
@ -484,7 +484,7 @@ read the documentation for :meth:`~torch.optim.Optimizer.zero_grad` by the PyTor
 -----

 ***************
-Things to avoid
+Things to Avoid
 ***************

 .item(), .numpy(), .cpu()
@ -496,9 +496,9 @@ takes a great deal of care to be optimized for this.
 Clear Cache
 ===========

-Don't call :func:`torch.cuda.empty_cache` unnecessarily! Every time you call this ALL your GPUs have to wait to sync.
+Don't call :func:`torch.cuda.empty_cache` unnecessarily! Every time you call this, ALL your GPUs have to wait to sync.

-Transferring tensors to device
+Transferring Tensors to Device
 ==============================

 LightningModules know what device they are on! Construct tensors on the device directly to avoid CPU->Device transfer.
@ -512,7 +512,7 @@ LightningModules know what device they are on! Construct tensors on the device d
    t = torch.rand(2, 2, device=self.device)


-For tensors that need to be model attributes, it is best practice to register them as buffers in the modules's
+For tensors that need to be model attributes, it is best practice to register them as buffers in the module's
 ``__init__`` method:

 .. code-block:: python
--- a/docs/source/starter/converting.rst
+++ b/docs/source/starter/converting.rst
@ -6,12 +6,11 @@

 .. _converting:

-
 ######################################
-How to organize PyTorch into Lightning
+How to Organize PyTorch Into Lightning
 ######################################

-To enable your code to work with Lightning, here's how to organize PyTorch into Lightning:
+To enable your code to work with Lightning, perform the following to organize PyTorch into Lightning.

 --------

--- a/docs/source/starter/installation.rst
+++ b/docs/source/starter/installation.rst
@ -56,14 +56,14 @@ You can also use `Conda Environments <https://docs.conda.io/projects/conda/en/la
 Installation from Source
 ************************

-Install nightly from the source. Note that it contains all the bugfixes and newly released features that
-are not published yet. This is the bleeding edge so use it at your own discretion.
+Install nightly from the source. Note that it contains all the bug fixes and newly released features that
+are not published yet. This is the bleeding edge, so use it at your own discretion.

 .. code-block:: bash

    pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/master.zip

-Install future patch release from the source. Note that patch release contains only the bugfixes for the recent major release.
+Install future patch releases from the source. Note that the patch release contains only the bug fixes for the recent major release.

 .. code-block:: bash

--- a/docs/source/starter/lightning_lite.rst
+++ b/docs/source/starter/lightning_lite.rst
@ -29,7 +29,7 @@ Learn by example
 ****************


-My existing PyTorch code
+My Existing PyTorch Code
 ========================

 The ``run`` function contains custom training loop used to train ``MyModel`` on ``MyDataset`` for ``num_epochs`` epochs.
@ -129,13 +129,13 @@ That's all. You can now train on any kind of device and scale your training.
 :class:`~pytorch_lightning.lite.LightningLite` takes care of device management, so you don't have to.
 You should remove any device-specific logic within your code.

-Here is how to train on 8 GPUs with `torch.bfloat16 <https://pytorch.org/docs/1.10.0/generated/torch.Tensor.bfloat16.html>`_ precision:
+Here is how to train on eight GPUs with `torch.bfloat16 <https://pytorch.org/docs/1.10.0/generated/torch.Tensor.bfloat16.html>`_ precision:

 .. code-block:: python

    Lite(strategy="ddp", devices=8, accelerator="gpu", precision="bf16").run(10)

-Here is how to use `DeepSpeed Zero3 <https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html>`_ with 8 GPUs and precision 16:
+Here is how to use `DeepSpeed Zero3 <https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html>`_ with eight GPUs and precision 16:

 .. code-block:: python

@ -148,7 +148,7 @@ Here is how to use `DeepSpeed Zero3 <https://www.deepspeed.ai/news/2021/03/07/ze
    Lite(devices="auto", accelerator="auto", precision=16).run(10)

 You can also easily use distributed collectives if required.
-Here is an example while running on 256 GPUs (8 GPUs times 32 nodes).
+Here is an example while running on 256 GPUs (eight GPUs times 32 nodes).

 .. code-block:: python

@ -199,7 +199,7 @@ utility to move an object to the current device.
 .. tip::

    If you have hundreds or thousands of lines within your :meth:`~pytorch_lightning.lite.LightningLite.run` function
-    and you are feeling weird about it then this is right feeling.
+    and you are feeling unsure about them, then that is the correct feeling.
    In 2019, our :class:`~pytorch_lightning.core.lightning.LightningModule` was getting larger
    and we got the same feeling, so we started to organize our code for simplicity, interoperability and standardization.
    This is definitely a good sign that you should consider refactoring your code and / or switching to
--- a/docs/source/starter/style_guide.rst
+++ b/docs/source/starter/style_guide.rst
@ -80,6 +80,7 @@ Here's a LightningModule that defines a model. Although, we do not recommend to

 Self-contained
 ==============
+
 A Lightning module should be self-contained. To see how self-contained your model is, a good test is to ask
 yourself this question:

@ -90,7 +91,6 @@ a specific learning rate scheduler to work well.

 Init
 ====
-
 The first place where LightningModules tend to stop being self-contained is in the init. Try to define all the relevant
 sensible defaults in the init so that the user doesn't have to guess.

@ -103,7 +103,7 @@ Here's an example where a user will have to go hunt through files to figure out
            self.lr = params.lr
            self.coef_x = params.coef_x

-Models defined as such leave you with many questions, such as what is coef_x? Is it a string? A float? What is the range?
+Models defined as such leave you with many questions, such as what is ``coef_x``? Is it a string? A float? What is the range?
 Instead, be explicit in your init

 .. testcode::
@ -118,8 +118,7 @@ user can see the value immediately.

 Method Order
 ============
-
-At the bare minimum, the only required methods in the LightningModule to configure a training pipeline are:
+The only required methods in the LightningModule are:

 * init
 * training_step
@ -168,10 +167,12 @@ In practice, the code looks like this:

        def any_extra_hook(...):

+
 Forward vs training_step
 ========================

-We recommend using forward for inference/predictions and keeping ``training_step`` independent.
+We recommend using :meth:`~pytorch_lightning.core.lightning.LightningModule.forward` for inference/predictions and keeping
+:meth:`~pytorch_lightning.core.lightning.LightningModule.training_step` independent.

 .. code-block:: python

@ -181,7 +182,7 @@ We recommend using forward for inference/predictions and keeping ``training_step


    def training_step(self, batch, batch_idx):
-        x, y = batch
+        x, _ = batch
        z = self.encoder(x)
        pred = self.decoder(z)
        ...
@ -195,13 +196,13 @@ Data

 These are best practices for handling data.

-Dataloaders
+DataLoaders
 ===========

 Lightning uses :class:`~torch.utils.data.DataLoader` to handle all the data flow through the system. Whenever you structure dataloaders,
 make sure to tune the number of workers for maximum efficiency.

-.. warning:: Make sure not to use ``Trainer(strategy="ddp_spawn")`` with ``num_workers>0`` in a DataLoader or you will bottleneck your code.
+.. warning:: Make sure not to use ``Trainer(strategy="ddp_spawn")`` with ``num_workers>0`` in the DataLoader or you will bottleneck you code.

 DataModules
 ===========
@ -212,13 +213,18 @@ datasets with your model, so you can test it and benchmark it across domains. It

 Check out :ref:`data` document to understand data management within Lightning and its best practices.

------------
+* What dataset splits were used?
+* How many samples does this dataset have overall and within each split?
+* Which transforms were used?

-********
-Examples
-********
+It's for this reason that we recommend you use datamodules. This is especially important when collaborating because
+it will save your team a lot of time as well.

-Checkout the live examples to get your hands dirty:
+All they need to do is drop a datamodule into the Trainer and not worry about what was done to the data.

+This is true for both academic and corporate settings where data cleaning and ad-hoc instructions slow down the progress
+of iterating through ideas.
+
+- Checkout the live examples to get your hands dirty:
 - `Introduction to PyTorch Lightning <https://pytorch-lightning.readthedocs.io/en/stable/notebooks/lightning_examples/mnist-hello-world.html>`_
 - `Introduction to DataModules <https://pytorch-lightning.readthedocs.io/en/stable/notebooks/lightning_examples/datamodules.html>`_