Some docs update (#3794)

* docs update

* docs update

* suggestions

* Update docs/source/introduction_guide.rst

Co-authored-by: William Falcon <waf2107@columbia.edu>
This commit is contained in:
Jeff Yang 2020-10-03 18:45:07 +06:30 committed by GitHub
parent a677833f84
commit 62320632d4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 72 additions and 55 deletions

View File

@ -34,7 +34,7 @@ Move the model architecture and forward pass to your :class:`~pytorch_lightning.
2. Move the optimizer(s) and schedulers
=======================================
Move your optimizers to :func:`pytorch_lightning.core.LightningModule.configure_optimizers` hook. Make sure to use the hook parameters (self in this case).
Move your optimizers to the :func:`~pytorch_lightning.core.LightningModule.configure_optimizers` hook.
.. testcode::
@ -46,7 +46,8 @@ Move your optimizers to :func:`pytorch_lightning.core.LightningModule.configure_
3. Find the train loop "meat"
=============================
Lightning automates most of the trining for you, the epoch and batch iterations, all you need to keep is the training step logic. This should go into :func:`pytorch_lightning.core.LightningModule.training_step` hook (make sure to use the hook parameters, self in this case):
Lightning automates most of the training for you, the epoch and batch iterations, all you need to keep is the training step logic.
This should go into the :func:`~pytorch_lightning.core.LightningModule.training_step` hook (make sure to use the hook parameters, ``batch`` and ``batch_idx`` in this case):
.. testcode::
@ -60,7 +61,8 @@ Lightning automates most of the trining for you, the epoch and batch iterations,
4. Find the val loop "meat"
===========================
To add an (optional) validation loop add logic to :func:`pytorch_lightning.core.LightningModule.validation_step` hook (make sure to use the hook parameters, self in this case).
To add an (optional) validation loop add logic to the
:func:`~pytorch_lightning.core.LightningModule.validation_step` hook (make sure to use the hook parameters, ``batch`` and ``batch_idx`` in this case).
.. testcode::
@ -72,11 +74,12 @@ To add an (optional) validation loop add logic to :func:`pytorch_lightning.core.
val_loss = F.cross_entropy(y_hat, y)
return val_loss
.. note:: model.eval() and torch.no_grad() are called automatically for validation
.. note:: ``model.eval()`` and ``torch.no_grad()`` are called automatically for validation
5. Find the test loop "meat"
============================
To add an (optional) test loop add logic to :func:`pytorch_lightning.core.LightningModule.test_step` hook (make sure to use the hook parameters, self in this case).
To add an (optional) test loop add logic to the
:func:`~pytorch_lightning.core.LightningModule.test_step` hook (make sure to use the hook parameters, ``batch`` and ``batch_idx`` in this case).
.. testcode::
@ -88,7 +91,7 @@ To add an (optional) test loop add logic to :func:`pytorch_lightning.core.Lightn
loss = F.cross_entropy(y_hat, y)
return loss
.. note:: model.eval() and torch.no_grad() are called automatically for testing.
.. note:: ``model.eval()`` and ``torch.no_grad()`` are called automatically for testing.
The test loop will not be used until you call.
@ -96,7 +99,7 @@ The test loop will not be used until you call.
trainer.test()
.. note:: .test() loads the best checkpoint automatically
.. tip:: .test() loads the best checkpoint automatically
6. Remove any .cuda() or to.device() calls
==========================================

View File

@ -98,8 +98,8 @@ Let's first start with the model. In this case we'll design a 3-layer neural net
x = F.log_softmax(x, dim=1)
return x
Notice this is a :class:`~pytorch_lightning.core.LightningModule` instead of a `torch.nn.Module`. A LightningModule is
equivalent to a pure PyTorch Module except it has added functionality. However, you can use it EXACTLY the same as you would a PyTorch Module.
Notice this is a :class:`~pytorch_lightning.core.LightningModule` instead of a ``torch.nn.Module``. A LightningModule is
equivalent to a pure PyTorch Module except it has added functionality. However, you can use it **EXACTLY** the same as you would a PyTorch Module.
.. testcode::
@ -274,8 +274,8 @@ Using DataModules allows easier sharing of full dataset definitions.
model = LitModel(num_classes=imagenet_dm.num_classes)
trainer.fit(model, imagenet_dm)
.. note:: `prepare_data` is called only one 1 GPU in distributed training (automatically)
.. note:: `setup` is called on every GPU (automatically)
.. note:: ``prepare_data()`` is called on only one GPU in distributed training (automatically)
.. note:: ``setup()`` is called on every GPU (automatically)
Models defined by data
^^^^^^^^^^^^^^^^^^^^^^
@ -292,10 +292,12 @@ When your models need to know about the data, it's best to process the data befo
trainer.fit(model, dm)
1. use `prepare_data` to download and process the dataset.
2. use `setup` to do splits, and build your model internals
1. use ``prepare_data()`` to download and process the dataset.
2. use ``setup()`` to do splits, and build your model internals
An alternative to using a DataModule is to defer initialization of the models modules to the `setup` method of your LightningModule as follows:
|
An alternative to using a DataModule is to defer initialization of the models modules to the ``setup`` method of your LightningModule as follows:
.. testcode::
@ -326,7 +328,7 @@ In PyTorch we do it as follows:
optimizer = Adam(LitMNIST().parameters(), lr=1e-3)
In Lightning we do the same but organize it under the configure_optimizers method.
In Lightning we do the same but organize it under the :func:`~pytorch_lightning.core.LightningModule.configure_optimizers` method.
.. testcode::
@ -379,8 +381,8 @@ In the case of MNIST we do the following
optimizer.step()
optimizer.zero_grad()
In Lightning, everything that is in the training step gets organized under the `training_step` function
in the LightningModule
In Lightning, everything that is in the training step gets organized under the
:func:`~pytorch_lightning.core.LightningModule.training_step` function in the LightningModule.
.. testcode::
@ -546,7 +548,7 @@ Or multiple nodes
Refer to the :ref:`distributed computing guide for more details <multi_gpu>`.
train on TPUs
Train on TPUs
^^^^^^^^^^^^^
Did you know you can use PyTorch on TPUs? It's very hard to do, but we've
worked with the xla team to use their awesome library to get this to work
@ -578,11 +580,11 @@ In distributed training (multiple GPUs and multiple TPU cores) each GPU or TPU c
of this program. This means that without taking any care you will download the dataset N times which
will cause all sorts of issues.
To solve this problem, make sure your download code is in the `prepare_data` method in the DataModule.
To solve this problem, make sure your download code is in the ``prepare_data`` method in the DataModule.
In this method we do all the preparation we need to do once (instead of on every gpu).
`prepare_data` can be called in two ways, once per node or only on the root node
(`Trainer(prepare_data_per_node=False)`).
``prepare_data`` can be called in two ways, once per node or only on the root node
(``Trainer(prepare_data_per_node=False)``).
.. code-block:: python
@ -619,7 +621,7 @@ In this method we do all the preparation we need to do once (instead of on every
def test_dataloader(self):
return DataLoader(self.test_dataset, batch_size=self.batch_size)
The `prepare_data` method is also a good place to do any data processing that needs to be done only
The ``prepare_data`` method is also a good place to do any data processing that needs to be done only
once (ie: download or tokenize, etc...).
.. note:: Lightning inserts the correct DistributedSampler for distributed training. No need to add yourself!
@ -657,7 +659,7 @@ Validating
For most cases, we stop training the model when the performance on a validation
split of the data reaches a minimum.
Just like the `training_step`, we can define a `validation_step` to check whatever
Just like the ``training_step``, we can define a ``validation_step`` to check whatever
metrics we care about, generate samples or add more to our logs.
.. code-block:: python
@ -676,7 +678,7 @@ Now we can train with a validation loop as well.
trainer = Trainer(tpu_cores=8)
trainer.fit(model, train_loader, val_loader)
You may have noticed the words `Validation sanity check` logged. This is because Lightning runs 2 batches
You may have noticed the words **Validation sanity check** logged. This is because Lightning runs 2 batches
of validation before starting to train. This is a kind of unit test to make sure that if you have a bug
in the validation loop, you won't need to potentially wait a full epoch to find out.
@ -744,7 +746,7 @@ Just like the validation loop, we define a test loop
However, to make sure the test set isn't used inadvertently, Lightning has a separate API to run tests.
Once you train your model simply call `.test()`.
Once you train your model simply call ``.test()``.
.. code-block:: python
@ -794,8 +796,8 @@ and use it for prediction.
x = torch.randn(1, 1, 28, 28)
out = model(x)
On the surface, it looks like `forward` and `training_step` are similar. Generally, we want to make sure that
what we want the model to do is what happens in the `forward`. whereas the `training_step` likely calls forward from
On the surface, it looks like ``forward`` and ``training_step`` are similar. Generally, we want to make sure that
what we want the model to do is what happens in the ``forward``. whereas the ``training_step`` likely calls forward from
within it.
.. testcode::
@ -879,7 +881,7 @@ Or maybe we have a model that we use to do generation
z = sample_noise()
generated_imgs = model(z)
How you split up what goes in `forward` vs `training_step` depends on how you want to use this model for
How you split up what goes in ``forward`` vs ``training_step`` depends on how you want to use this model for
prediction.
----------------
@ -977,7 +979,7 @@ And pass the callbacks into the trainer
Starting to init trainer!
Trainer is init now
.. note::
.. tip::
See full list of 12+ hooks in the :ref:`callbacks`.
----------------
@ -1142,4 +1144,4 @@ the data to build your models.
In Lightning this code is organized inside a :ref:`datamodules`.
.. note:: DataModules are optional but encouraged, otherwise you can use standard DataModules
.. tip:: DataModules are optional but encouraged, otherwise you can use standard DataLoaders

View File

@ -286,7 +286,7 @@ a forward method or trace only the sub-models you need.
********************
Using CPUs/GPUs/TPUs
********************
It's trivial to use CPUs, GPUs or TPUs in Lightning. There's NO NEED to change your code, simply change the :class:`~pytorch_lightning.trainer.Trainer` options.
It's trivial to use CPUs, GPUs or TPUs in Lightning. There's **NO NEED** to change your code, simply change the :class:`~pytorch_lightning.trainer.Trainer` options.
.. code-block:: python
@ -377,6 +377,7 @@ If you prefer to do it manually, here's the equivalent
Data flow
*********
Each loop (training, validation, test) has three hooks you can implement:
- x_step
- x_step_end
- x_epoch_end
@ -434,7 +435,7 @@ The lightning equivalent is:
gpu_1_loss = losses[1]
return (gpu_0_loss + gpu_1_loss) * 1/2
The validation and test loops have the same structure.
.. tip:: The validation and test loops have the same structure.
-----------------
@ -467,6 +468,10 @@ you can override the default behavior by manually setting the flags
def training_step(self, batch, batch_idx):
self.log('my_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
.. note::
The loss value shown in the progress bar is smoothed (averaged) over the last values,
so it differs from the actual loss returned in train/validation step.
You can also use any method of your logger directly:
.. code-block:: python
@ -481,6 +486,10 @@ Once your training starts, you can view the logs by using your favorite logger o
tensorboard --logdir ./lightning_logs
.. note::
Lightning automatically shows the loss value returned from ``training_step`` in the progress bar.
So, no need to explicitly log like this ``self.log('loss', loss, prog_bar=True)``.
Read more about :ref:`loggers`.
----------------
@ -668,8 +677,9 @@ Or read our :ref:`introduction_guide` to learn more!
**********
Community
**********
Out community of core maintainers and thousands of expert researchers is active on our Slack and Forum. Drop by to
hang out, ask Lightning questions or even discuss research!
Our community of core maintainers and thousands of expert researchers is active on our
`Slack <https://join.slack.com/t/pytorch-lightning/shared_invite/zt-f6bl2l0l-JYMK3tbAgAmGRrlNr00f1A>`_
and `Forum <https://forums.pytorchlightning.ai/>`_. Drop by to hang out, ask Lightning questions or even discuss research!
Masterclass
===========

View File

@ -8,7 +8,7 @@ Here are some best practices to increase your performance.
Dataloaders
-----------
When building your Dataloader set `num_workers` > 0 and `pin_memory=True` (only for GPUs).
When building your DataLoader set ``num_workers > 0`` and ``pin_memory=True`` (only for GPUs).
.. code-block:: python
@ -16,23 +16,23 @@ When building your Dataloader set `num_workers` > 0 and `pin_memory=True` (only
num_workers
^^^^^^^^^^^
The question of how many `num_workers` is tricky. Here's a summary of
The question of how many ``num_workers`` is tricky. Here's a summary of
some references, [`1 <https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813>`_], and our suggestions.
1. num_workers=0 means ONLY the main process will load batches (that can be a bottleneck).
2. num_workers=1 means ONLY one worker (just not the main process) will load data but it will still be slow.
3. The num_workers depends on the batch size and your machine.
4. A general place to start is to set `num_workers` equal to the number of CPUs on that machine.
1. ``num_workers=0`` means ONLY the main process will load batches (that can be a bottleneck).
2. ``num_workers=1`` means ONLY one worker (just not the main process) will load data but it will still be slow.
3. The ``num_workers`` depends on the batch size and your machine.
4. A general place to start is to set ``num_workers`` equal to the number of CPUs on that machine.
.. warning:: Increasing num_workers will ALSO increase your CPU memory consumption.
.. warning:: Increasing ``num_workers`` will ALSO increase your CPU memory consumption.
The best thing to do is to increase the `num_workers` slowly and stop once you see no more improvement in your training speed.
The best thing to do is to increase the ``num_workers`` slowly and stop once you see no more improvement in your training speed.
Spawn
^^^^^
When using `distributed_backend=ddp_spawn` (the ddp default) or TPU training, the way multiple GPUs/TPU cores are used is by calling `.spawn()` under the hood.
The problem is that PyTorch has issues with `num_workers` > 0 when using .spawn(). For this reason we recommend you
use `distributed_backend=ddp` so you can increase the `num_workers`, however your script has to be callable like so:
When using ``distributed_backend=ddp_spawn`` (the ddp default) or TPU training, the way multiple GPUs/TPU cores are used is by calling ``.spawn()`` under the hood.
The problem is that PyTorch has issues with ``num_workers > 0`` when using ``.spawn()``. For this reason we recommend you
use ``distributed_backend=ddp`` so you can increase the ``num_workers``, however your script has to be callable like so:
.. code-block:: bash
@ -42,7 +42,7 @@ use `distributed_backend=ddp` so you can increase the `num_workers`, however you
.item(), .numpy(), .cpu()
-------------------------
Don't call .item() anywhere on your code. Use `.detach()` instead to remove the connected graph calls. Lightning
Don't call ``.item()`` anywhere in your code. Use ``.detach()`` instead to remove the connected graph calls. Lightning
takes a great deal of care to be optimized for this.
----------
@ -67,7 +67,7 @@ LightningModules know what device they are on! Construct tensors on the device d
For tensors that need to be model attributes, it is best practice to register them as buffers in the modules's
`__init__` method:
``__init__`` method:
.. code-block:: python
@ -87,25 +87,27 @@ DP performs three GPU transfers for EVERY batch:
2. Copy data to device.
3. Copy outputs of each device back to master.
|
Whereas DDP only performs 1 transfer to sync gradients. Because of this, DDP is MUCH faster than DP.
----------
16-bit precision
----------------
Use 16-bit to decrease the memory (and thus increase your batch size). On certain GPUs (V100s, 2080tis), 16-bit calculations are also faster.
Use 16-bit to decrease the memory consumption (and thus increase your batch size). On certain GPUs (V100s, 2080tis), 16-bit calculations are also faster.
However, know that 16-bit and multi-processing (any DDP) can have issues. Here are some common problems.
1. `CUDA error: an illegal memory access was encountered <https://github.com/pytorch/pytorch/issues/21819>`_.
The solution is likely setting a specific CUDA, CUDNN, PyTorch version combination.
2. `CUDA error: device-side assert triggered`. This is a general catch-all error. To see the actual error run your script like so:
2. ``CUDA error: device-side assert triggered``. This is a general catch-all error. To see the actual error run your script like so:
.. code-block:: bash
.. code-block:: bash
# won't see what the error is
python main.py
# won't see what the error is
python main.py
# will see what the error is
CUDA_LAUNCH_BLOCKING=1 python main.py
# will see what the error is
CUDA_LAUNCH_BLOCKING=1 python main.py
We also recommend using 16-bit native found in PyTorch 1.6. Just install this version and Lightning will automatically use it.
.. tip:: We also recommend using 16-bit native found in PyTorch 1.6. Just install this version and Lightning will automatically use it.