[docs] Add NCCL environment variable docs (#8345)
* Add nccl env variable docs * Wording * Update docs/source/guides/speed.rst Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
This commit is contained in:
parent
0dfc265e2f
commit
31fca1658d
|
@ -90,6 +90,26 @@ This by default comes with a performance hit, and can be disabled in most cases.
|
|||
plugins=DDPPlugin(find_unused_parameters=False),
|
||||
)
|
||||
|
||||
When using DDP on a multi-node cluster, set NCCL parameters
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
`NCCL <https://developer.nvidia.com/nccl>`__ is the NVIDIA Collective Communications Library which is used under the hood by PyTorch to handle communication across nodes and GPUs. There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/PyTorchLightning/pytorch-lightning/issues/7179>`__. In the issue we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2.
|
||||
|
||||
NCCL parameters can be adjusted via environment variables.
|
||||
|
||||
.. note::
|
||||
|
||||
AWS and GCP already set default values for these on their clusters. This is typically useful for custom cluster setups.
|
||||
|
||||
* `NCCL_NSOCKS_PERTHREAD <https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-nsocks-perthread>`__
|
||||
* `NCCL_SOCKET_NTHREADS <https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-socket-nthreads>`__
|
||||
* `NCCL_MIN_NCHANNELS <https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-min-nchannels>`__
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
export NCCL_NSOCKS_PERTHREAD=4
|
||||
export NCCL_SOCKET_NTHREADS=2
|
||||
|
||||
Dataloaders
|
||||
^^^^^^^^^^^
|
||||
When building your DataLoader set ``num_workers > 0`` and ``pin_memory=True`` (only for GPUs).
|
||||
|
|
Loading…
Reference in New Issue