diff --git a/docs/source/fast_training.rst b/docs/source/fast_training.rst index b741107ca1..970e948617 100644 --- a/docs/source/fast_training.rst +++ b/docs/source/fast_training.rst @@ -42,45 +42,26 @@ Must use an int if using an IterableDataset. # check every 100 train batches (ie: for IterableDatasets or fixed frequency) trainer = Trainer(val_check_interval=100) -Use training data subset ------------------------- -If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag. +Use data subset for training, validation and test +------------------------------------------------- +If you don't want to check 100% of the training/validation/test set (for debugging or if it's huge), set these flags. .. code-block:: python # DEFAULT - trainer = Trainer(train_percent_check=1.0) + trainer = Trainer( + train_percent_check=1.0, + val_percent_check=1.0, + test_percent_check=1.0 + ) - # check 10% only - trainer = Trainer(train_percent_check=0.1) + # check 10%, 20%, 30% only, respectively for training, validation and test set + trainer = Trainer( + train_percent_check=0.1, + val_percent_check=0.2, + test_percent_check=0.3 + ) -.. note:: ``train_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0. +.. note:: ``train_percent_check``, ``val_percent_check`` and ``test_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0. ``val_percent_check`` will be ignored if ``fast_dev_run=True``. -Use test data subset --------------------- -If you don't want to check 100% of the test set (for debugging or if it's huge), set this flag. - -.. code-block:: python - - # DEFAULT - trainer = Trainer(test_percent_check=1.0) - - # check 10% only - trainer = Trainer(test_percent_check=0.1) - -.. note:: ``test_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0. - -Use validation data subset --------------------------- -If you don't want to check 100% of the validation set (for debugging or if it's huge), set this flag. - -.. code-block:: python - - # DEFAULT - trainer = Trainer(val_percent_check=1.0) - - # check 10% only - trainer = Trainer(val_percent_check=0.1) - -.. note:: ``val_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0 and ignored if - ``fast_dev_run=True``. \ No newline at end of file +.. note:: If you set ``val_percent_check=0``, validation will be disabled. diff --git a/docs/source/slurm.rst b/docs/source/slurm.rst index 6c0bf190cf..2bac01b6f0 100644 --- a/docs/source/slurm.rst +++ b/docs/source/slurm.rst @@ -11,17 +11,14 @@ To train a model using multiple-nodes do the following: 1. Design your LightningModule. -2. Add `torch.DistributedSampler `_ - which enables access to a subset of your full dataset to each GPU. - -3. Enable ddp in the trainer +2. Enable ddp in the trainer .. code-block:: python # train on 32 GPUs across 4 nodes trainer = Trainer(gpus=8, num_nodes=4, distributed_backend='ddp') -4. It's a good idea to structure your train.py file like this: +3. It's a good idea to structure your train.py file like this: .. code-block:: python @@ -91,6 +88,8 @@ To train a model using multiple-nodes do the following: sbatch submit.sh +.. note:: using :class:`~torch.utils.data.distributed.DistributedSampler` is already handled by Lightning. + Walltime auto-resubmit ----------------------------------- When you use Lightning in a SLURM cluster, lightning automatically detects when it is about