"In this notebook, we'll go over the flags available in the `Trainer` object. Note that not everything will work in the Colab environment (multi-gpu, etc). This notebook accompanies the Trainer videos we'll be putting out.\n",
"\n",
"---\n",
" - Give us a ⭐ [on Github](https://www.github.com/PytorchLightning/pytorch-lightning/)\n",
" - Check out [the documentation](https://pytorch-lightning.readthedocs.io/en/latest/)\n",
" - Join us [on Slack](https://join.slack.com/t/pytorch-lightning/shared_invite/zt-f6bl2l0l-JYMK3tbAgAmGRrlNr00f1A)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jKj5lgdr5j48"
},
"source": [
"--- \n",
"### Setup \n",
"First thing first, we need to install Lightning. Simply ```pip install pytorch-lightning```"
"Were gonna define a simple Lightning model so we can play with all the settings of the Lightning Trainer.\n",
"\n",
"LightningModule is simply pure Pytorch reorganized into hooks, that represents all the steps in the training process.\n",
"\n",
"You can use LightningModule hooks to control every part of your model, but for the purpose of this video we will use a very simple MNIST classifier, a model that takes 28*28 grayscale images of hand written images, and can predict the digit between 0-9.\n",
"\n",
"The LightningModule can encompass a single model, like an image classifier, or a deep learning system composed of multiple models, like this auto encoder that contains an encoder and a decoder.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "x-34xKCI40yW"
},
"outputs": [],
"source": [
"class LitAutoEncoder(pl.LightningModule):\n",
"\n",
" def __init__(self, batch_size=32, lr=1e-3):\n",
" super().__init__()\n",
" self.encoder = nn.Sequential(\n",
" nn.Linear(28 * 28, 64),\n",
" nn.ReLU(),\n",
" nn.Linear(64, 3)\n",
" )\n",
" self.decoder = nn.Sequential(\n",
" nn.Linear(3, 64),\n",
" nn.ReLU(),\n",
" nn.Linear(64, 28 * 28)\n",
" )\n",
" self.batch_size=batch_size\n",
" self.learning_rate=lr\n",
"\n",
" def forward(self, x):\n",
" # in lightning, forward defines the prediction/inference actions\n",
"You'll notice the LightningModule doesn't have epoch and batch loops, we're not calling model.train() and model.eval(), and no mentions of CUDA or hardware. That's because it is all automated by the Lightning Trainer. All the engineering boilerplate is automated by the trainer: \n",
"\n",
"* Training loops\n",
"* Evaluation and test loops\n",
"* Calling model.train(), model.eval(), no_grad at the right time\n",
"* CUDA or to_device calls\n",
"\n",
"It also allows you to train your models on different hardware like GPUs and TPUs without changing your code!\n",
"\n",
"\n",
"### To use the lightning trainer simply:\n",
"\n",
"1. init your LightningModule and datasets\n",
"\n",
"2. init lightning trainer\n",
"\n",
"3. call trainer.fit\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HOk9c4_35FKg"
},
"outputs": [],
"source": [
"#####################\n",
"# 1. Init Model\n",
"#####################\n",
"\n",
"model = LitAutoEncoder()\n",
"\n",
"#####################\n",
"# 2. Init Trainer\n",
"#####################\n",
"\n",
"# these 2 flags are explained in the later sections...but for short explanation:\n",
"# - progress_bar_refresh_rate: limits refresh rate of tqdm progress bar so Colab doesn't freak out\n",
"# - max_epochs: only run 2 epochs instead of default of 1000\n",
"Our model is training just like that, using the Lightning defaults. The beauty of Lightning is that everything is easily configurable.\n",
"In our next videos were going to show you all the ways you can control your Trainer to do things like controlling your training, validation and test loops, running on GPUs and TPUs, checkpointing, early stopping, and a lot more.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "z_Wry2MckQkI"
},
"source": [
"# Training loop and eval loop Flags"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0MkI1xB2vsLj"
},
"source": [
"\n",
"To really scale up your networks, you can use accelerators like GPUs. GPUs or Graphical Processing Units, parallelize matrix multiplications which enable speed ups of at least 100x over training on CPUs.\n",
"\n",
"Let's say you have a machine with 8 GPUs on it. You can set this flag to 1, 4, or 8 GPUs and lightning will automatically distribute your training for you.\n",
"\n",
"```\n",
"trainer = pl.Trainer(gpus=1)\n",
"```\n",
"\n",
"---------\n",
"\n",
"Lightning makes your code hardware agnostic... This means, you can switch between CPUs, GPUs without code changes.\n",
"\n",
"However, it requires forming good PyTorch habits:\n",
"\n",
"1. First, remove the .cuda() or .to() calls in your code.\n",
"2. Second, when you initialize a new tensor, set the device=self.device in the call since every lightningModule knows what gpu index or TPU core it is on.\n",
"\n",
"You can also use type_as and or you can register the tensor as a buffer in your module’s __init__ method with register_buffer().\n",
" # you can now access self.sigma anywhere in your module\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hw6jJhhjvlSL"
},
"source": [
"Lightning Trainer automates all the engineering boilerplate like iterating over epochs and batches, training eval and test loops, CUDA and to(device) calls, calling model.train and model.eval.\n",
"\n",
"You still have full control over the loops, by using the following trainer flags:"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pT5-ETH9eUg6"
},
"source": [
"## Calling validation steps\n",
"Sometimes, training an epoch may be pretty fast, like minutes per epoch. In this case, you might not need to validate on every epoch. Instead, you can actually validate after a few epochs.\n",
"\n",
"Use `check_val_every_n_epoch` flag to control the frequency of validation step:"
"In some cases where your epoch is very long, you might want to check validation within an epoch.\n",
"\n",
"You can also run validation step within your training epochs, by setting `val_check_interval` flag.\n",
"\n",
"Set `val_check_interval` to a float between [0.0 to 1.0] to check your validation set within a training epoch. For example, setting it to 0.25 will check your validation set 4 times during a training epoch.\n",
"\n",
"Default is set to 1.0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9kbUbvrUVLrT"
},
"outputs": [],
"source": [
"# check validation set 4 times during a training epoch\n",
"When you have iterable data sets, or when streaming data for production use cases, it is useful to check the validation set every number of steps. \n",
"Set val_check_interval to an int:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "psn6DVb5Vi85"
},
"outputs": [],
"source": [
"# check validation set every 1000 training batches\n",
"# use this when using iterableDataset and your dataset has no length\n",
"You can set limits on how much of training, validation and test dataset you want your model to check. This is useful if you have really large validation or tests sets, for debugging or testing something that happens at the end of an epoch.\n",
"\n",
"Set the flag to int to specify the number of batches to run\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XiK5cFKL1rcA"
},
"outputs": [],
"source": [
"# run for only 10 batches\n",
"trainer = pl.Trainer(limit_test_batches=10)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y4LK0g65RrBm"
},
"source": [
"For example, some metrics need to be computed on the entire validation results, such as AUC ROC. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8MmeRs2DR3dD"
},
"outputs": [],
"source": [
"trainer = pl.Trainer(limit_val_batches=10)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xmigcNa1A2Vy"
},
"source": [
"You can use a float to limit the batches be percentage of the set on every epoch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "W7uGJt8nA4tv"
},
"outputs": [],
"source": [
"# run through only 25% of the test set each epoch\n",
"You can use all the GPUs you have available by setting `gpus=-1`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "r6cKQijYrtPe"
},
"outputs": [],
"source": [
"# trainer = Trainer(gpus='-1') - equivalent\n",
"trainer = pl.Trainer(gpus=-1)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2C-fNLm3UGCV"
},
"source": [
"Lightning uses the PCI bus_id as the index for ordering GPUs."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_V75s7EhOFhE"
},
"source": [
"### `auto_select_gpus`\n",
"\n",
"You can save on GPUs by running in “exclusive mode”, meaning only one process at a time can access them. If your not sure which GPUs you should use when running exclusive mode, Lightning can automatically find unoccupied GPUs for you. \n",
"\n",
"Simply specify the number of gpus as an integer `gpus=k`, and set the trainer flag `auto_select_gpus=True`. Lightning will automatically help you find k gpus that are not occupied by other processes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_Sd3XFsAOIwd"
},
"outputs": [],
"source": [
"# enable auto selection (will find two available gpus on system)\n",
"This is useful to analyze the memory usage of your GPUs.\n",
"\n",
"To get the GPU memory usage for every GPU on the master node, set the flag to log_gpu_memory=all.\n",
"\n",
"Under the hood, lightning uses the nvidia-smi command which may slow your training down.\n",
"\n",
"Your logs can become overwhelmed if you log the usage from many GPUs at once. In this case, you can also set the flag to min_max which will log only the min and max usage across all the GPUs of the master node.\n",
"\n",
"Note that lightning is not logging the usage across all nodes for performance reasons."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "idus3ZGahOki"
},
"outputs": [],
"source": [
"# log all the GPUs (on master node only)\n",
"trainer = Trainer(log_gpu_memory='all')\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-mevgiy_hkip"
},
"source": [
"To avoid the performance decrease you can also set `log_gpu_memory=min_max` to only log the min and max memory on the master node.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SlvLJnWyhs7J"
},
"outputs": [],
"source": [
"# log only the min and max memory on the master node\n",
"trainer = Trainer(log_gpu_memory='min_max')\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "K82FLLIJVQG3"
},
"source": [
"\n",
"But what if you want to train on multiple machines and not just one?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YViQ6PXesAue"
},
"source": [
"# Training on multiple GPUs"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WacbBQUivxQq"
},
"source": [
"Lightning makes your models hardware agnostic, and you can run on GPUs with a flip of a flag. Lightning also supports training on multiple GPUs across many machines.\n",
"\n",
"You can do this by setting the num_nodes flag.\n",
"\n",
"The world size, or the total number of GPUs you are using, will be gpus*num_nodes.\n",
"\n",
"If i set gpus=8 and num_nodes=32 then I will be training on 256 GPUs."
"DDP is the fastest and recommended way to distribute your training, but you can pass in other backends to `accelerator` trainer flag, when DDP is not supported.\n",
"We STRONGLY discourage this use because it has limitations (due to Python and PyTorch):\n",
"\n",
"* Since .spawn() trains the model in subprocesses, the model on the main process does not get updated.\n",
"\n",
"* Dataloader(num_workers=N), where N is large, bottlenecks training with DDP… ie: it will be VERY slow or won’t work at all. This is a PyTorch limitation.\n",
"\n",
"* Forces everything to be picklable.\n",
"\n",
"DDP is MUCH faster than DDP_spawn. To be able to use DDP we recommend you: \n",
"\n",
"1. Install a top-level module for your project using setup.py\n",
"\n",
"```\n",
"# setup.py\n",
"#!/usr/bin/env python\n",
"\n",
"from setuptools import setup, find_packages\n",
"\n",
"setup(name='src',\n",
" version='0.0.1',\n",
" description='Describe Your Cool Project',\n",
" author='',\n",
" author_email='',\n",
" url='https://github.com/YourSeed', # REPLACE WITH YOUR OWN GITHUB PROJECT LINK\n",
"If you're using windows, DDP is not supported. You can use `dp` for DataParallel instead: DataParallel uses multithreading, instead of multiprocessing. It splits a batch across k GPUs. That is, if you have a batch of 32 and use DP with 2 gpus, each GPU will process 16 samples, after which the root node will aggregate the results.\n",
"\n",
"DP use is discouraged by PyTorch and Lightning. Use DDP which is more stable and at least 3x faster.\n"
"In certain cases, it’s advantageous to use ***all*** batches on the same machine, instead of a subset. For instance, in self-supervised learning, a common performance boost comes from increasing the number of negative samples.\n",
"\n",
"In this case, we can use DDP2 which behaves like DP in a machine and DDP across nodes. DDP2 does the following:\n",
"- The second mode is ddp_spawn. This works like ddp, but instead of calling your script multiple times, lightning will use multiprocessing spawn to start a subprocess per GPU. \n",
"\n",
"However, you should be careful of mixing this mode with num_workers > 0 in your dataloaders because it will bottleneck your training. This is a current known limitation of PyTorch which is why we recommend using our ddp implementation instead.\n"
"Testing or debugging DDP can be hard, so we have a distributed backend that simulates ddp on cpus to make it easier. Set `num_processes` to a number greater than 1 when using accelerator=\"ddp_cpu\" to mimic distributed training on a machine without GPUs. Note that while this is useful for debugging, it will not provide any speedup, since single-process Torch already makes efficient use of multiple CPUs."
"Another option for accelerating your training is using TPUs.\n",
"A TPU is a Tensor processing unit, designed specifically for deep learning. Each TPU has 8 cores where each core is optimized for 128x128 matrix multiplies. Google estimates that 8 TPU cores are about as fast as 4 V100 GPUs!\n",
"\n",
"A TPU pod hosts many TPUs on it. Currently, TPU pod v2 has 2048 cores! You can request a full pod from Google cloud or a “slice” which gives you some subset of those 2048 cores.\n",
"\n",
"At this moment, TPUs are available on Google Cloud (GCP), Google Colab and Kaggle Environments.\n",
"\n",
"Lightning supports training on TPUs without any code adjustments to your model. Just like when using GPUs, Lightning automatically inserts the correct samplers - no need to do this yourself!\n",
"\n",
"Under the hood, lightning uses the XLA framework developed jointly by the facebook and google XLA teams. And we want to recognize their efforts in advancing TPU adoption of PyTorch.\n",
"\n",
"## tpu_cores\n",
"To train on TPUs, set the tpu_cores flag.\n",
"\n",
"When using colab or kaggle, the allowed values are 1 or 8 cores. When using google cloud, any value above 8 is allowed.\n",
"\n",
"Your effective batch size is the batch size passed into a dataloader times the total number of tpu cores."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "itP9y70gmD9M"
},
"outputs": [],
"source": [
"# int: train on a single core\n",
"trainer = pl.Trainer(tpu_cores=1)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NJKnzPb3mKEg"
},
"outputs": [],
"source": [
"# int: train on all cores few cores\n",
"trainer = pl.Trainer(tpu_cores=8)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8a4exfWUmOHq"
},
"source": [
"You can also choose which TPU core to train on, by passing a list [1-8]. This is not an officially supported use case but we are working with the XLA team to improve this user experience.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "S6OrjE_bmT-_"
},
"outputs": [],
"source": [
"# list: train on a single selected core\n",
"trainer = pl.Trainer(tpu_cores=[2])\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Afqx3sFUmfWD"
},
"source": [
"To train on more than 8 cores (ie: a POD), submit this script using the xla_dist script.\n",
"\n",
"\n",
"\n",
"```\n",
"python -m torch_xla.distributed.xla_dist\n",
"--tpu=$TPU_POD_NAME\n",
"--conda-env=torch-xla-nightly\n",
"--env=XLA_USE_BF16=1\n",
"-- python your_trainer_file.py\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ncPvbUVQqKOh"
},
"source": [
"# Advanced distributed training\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4MP7bEgnv7qK"
},
"source": [
"\n",
"Lightning supports distributed training across multiple GPUs and TPUs out of the box by setting trainer flags, but it also allows you to control the way sampling is done if you need to."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wdHiTfAMepKH"
},
"source": [
"## replace_sampler_ddp\n",
"In PyTorch, you must use torch.nn.DistributedSampler for multi-node or GPU training. The sampler makes sure each GPU sees the appropriate part of your data.\n",
"\n",
"```\n",
"# without lightning\n",
"def train_dataloader(self):\n",
" dataset = MNIST(...)\n",
" sampler = None\n",
"\n",
" if self.on_tpu:\n",
" sampler = DistributedSampler(dataset)\n",
"\n",
" return DataLoader(dataset, sampler=sampler)\n",
"```\n",
"Lightning adds the correct samplers when needed, so no need to explicitly add samplers. By default it will add `shuffle=True` for train sampler and `shuffle=False` for val/test sampler.\n",
"\n",
"If you want to customize this behaviour, you can set `replace_sampler_ddp=False` and add your own distributed sampler.\n",
"\n",
"(note: For iterable datasets, we don’t do this automatically.)\n"
"When doing multi NODE training, if your nodes share the same file system, then you don't want to download data more than once to avoid possible collisions. \n",
"\n",
"Lightning automatically calls the prepare_data hook on the root GPU of the master node (ie: only a single GPU).\n",
"\n",
"In some cases where your nodes don't share the same file system, you need to download the data on each node. In this case you can set this flag to true and lightning will download the data on the root GPU of each node.\n",
"Lightning offers a couple of flags to make debugging your models easier:\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AKoS3fdml4Jx"
},
"source": [
"## Fast Dev Run\n",
"\n",
"To help you save time debugging, your first run should use the fast_dev_run flag.\n",
"\n",
"This won't generate logs or save checkpoints but will touch every line of your code to make sure that it is working as intended.\n",
"\n",
"Think about this flag like a compiler. You make changes to your code, and run Trainer with this flag to verify that your changes are bug free.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "L5vuG7GSmhzK"
},
"outputs": [],
"source": [
"trainer = pl.Trainer(fast_dev_run=True)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HRP1qQR5nT4p"
},
"source": [
"## overfit_batches\n",
"\n",
"Uses this much data of the training set. If nonzero, will use the same training set for validation and testing. If the training dataloaders have shuffle=True, Lightning will automatically disable it.\n",
"\n",
"Useful for quickly debugging or trying to overfit on purpose."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NTM-dqGMnXms"
},
"outputs": [],
"source": [
"# use only 1% of the train set (and use the train set for val and test)\n",
"trainer = pl.Trainer(overfit_batches=0.01)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c0LV0gC3nl1X"
},
"outputs": [],
"source": [
"# overfit on 10 of the same batches\n",
"trainer = pl.Trainer(overfit_batches=10)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lt3UHU6WgtS_"
},
"source": [
"Or a float to represent percentage of data to run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "K3yUqADhgnkf"
},
"outputs": [],
"source": [
"# run through only 25% of the test set each epoch\n",
"In the case of multiple test dataloaders, the limit applies to each dataloader individually.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8aQx5SLeMz1R"
},
"source": [
"# accumulate_grad_batches\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "g8GczZXFwKC7"
},
"source": [
"The batch size controls the accuracy of the estimate of the gradients. Small batch size use less memory, but decrease accuracy. When training large models, such as NLP transformers, it is useful to accumulate gradients before calling backwards(). It allows for bigger batch sizes than what can actually fit on a GPU/TPU in a single step.\n",
"\n",
"Use accumulate_grad_batches to accumulate gradients every k batches or as set up in the dict. Trainer also calls optimizer.step() for the last indivisible step number.\n",
"\n",
"For example, set accumulate_grad_batches to 4 to accumulate every 4 batches. In this case the effective batch size is batch_size*4, so if your batch size is 32, effectively it will be 128."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2jB6-Z_yPhhf"
},
"outputs": [],
"source": [
"# accumulate every 4 batches (effective batch size is batch*4)\n",
"You can also pass a dictionary to specify different accumulation per epoch. We can set it to `{5: 3, 10: 20}` to have no accumulation for epochs 1 to 4, accumulate 3 batches for epoch 5 to 10, and 20 batches after that."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "X3xsoZ3YPgBv"
},
"outputs": [],
"source": [
"# no accumulation for epochs 1-4. accumulate 3 for epochs 5-10. accumulate 20 after that\n",
"Most deep learning frameworks like PyTorch, train with 32-bit floating point arithmetic. \n",
"\n",
"But many models can still achieve full accuracy using half the precision.\n",
"\n",
"In 2017, NVIDIA researchers successfully used a combination of 32 and 16 bit precision (also known as mixed precision) and achieved the same accuracy as 32 bit precision training.\n",
"\n",
"The main two advantages are:\n",
"\n",
"- a reduction in memory requirements which enables larger batch sizes and models.\n",
"- and a speed up in compute. On ampere, turing and volta architectures 16 bit precision models can train at least 3 times faster.\n",
"\n",
"As of PyTorch 1.6, NVIDIA and Facebook moved mixed precision functionality into PyTorch core as the AMP package, torch.cuda.amp. \n",
"\n",
"This package supersedes the apex package developed by NVIDIA."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TjNypZPHnxvJ"
},
"source": [
"## precision\n",
"\n",
"Use precision flag to switch between full precision (32) to half precision (16). Can be used on CPU, GPU or TPUs.\n",
"\n",
"When using PyTorch 1.6+ Lightning uses the native amp implementation to support 16-bit.\n",
"\n",
"If used on TPU will use torch.bfloat16 but tensor printing will still show torch.float32"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kBZKMVx1nw-D"
},
"outputs": [],
"source": [
"# 16-bit precision\n",
"trainer = pl.Trainer(gpus=1, precision=16)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VJGj3Jh7oQXU"
},
"source": [
"In earlier version of Lightning, we use NVIDIA Apex for 16-bit precision. Apex was the first library to attempt 16-bit and the automatic mixed precision library (amp), has since been merged into core PyTorch as of 1.6.\n",
"\n",
"If you insist in using Apex, you can set the amp_backend flag to 'apex' and install Apex on your own."
"O1 (Conservative Mixed Precision): only some whitelist ops are done in FP16.\n",
"O2 (Fast Mixed Precision): this is the standard mixed precision training. It maintains FP32 master weights and optimizer.step acts directly on the FP32 master weights.\n",
"O3 (FP16 training): full FP16. Passing keep_batchnorm_fp32=True can speed things up as cudnn batchnorm is faster anyway.\n"
"Lightning can help you improve your model by using auto_scale_batch_size flag, which tries to find the largest batch size that fits into memory, before you start your training.\n",
"Larger batch size often yields better estimates of gradients, but may also result in longer training time. \n",
"\n",
"Set it to True to initially run a batch size finder trying to find the largest batch size that fits into memory. The result will be stored in self.batch_size in the LightningModule.\n"
"You can set the value to `power`. `power` scaling starts from a batch size of 1 and keeps doubling the batch size until an out-of-memory (OOM) error is encountered.\n"
"This feature expects that a batch_size field in the hparams of your model, i.e., model.hparams.batch_size should exist and will be overridden by the results of this algorithm. \n",
"\n",
"Additionally, your train_dataloader() method should depend on this field for this feature to work.\n",
"\n",
"The algorithm in short works by:\n",
"1. Dumping the current state of the model and trainer\n",
"\n",
"2. Iteratively until convergence or maximum number of tries max_trials (default 25) has been reached:\n",
"* Call fit() method of trainer. This evaluates steps_per_trial (default 3) number of training steps. Each training step can trigger an OOM error if the tensors (training batch, weights, gradients etc.) allocated during the steps have a too large memory footprint.\n",
" * If an OOM error is encountered, decrease the batch size\n",
" * Else increase it.\n",
"* How much the batch size is increased/decreased is determined by the chosen strategy.\n",
"\n",
"3. The found batch size is saved to model.hparams.batch_size\n",
"\n",
"4. Restore the initial state of model and trainer\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "q4CvxfZmOWBd"
},
"source": [
"# `auto_lr_find`\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "j85e8usNwdBV"
},
"source": [
"Selecting a good learning rate for your deep learning training is essential for both better performance and faster convergence.\n",
"\n",
"Even optimizers such as Adam that are self-adjusting the learning rate can benefit from more optimal choices.\n",
"\n",
"To reduce the amount of guesswork concerning choosing a good initial learning rate, you can use Lightning auto learning rate finder.\n",
"\n",
"The learning rate finder does a small run where the learning rate is increased after each processed batch and the corresponding loss is logged. The result of this is a lr vs. loss plot that can be used as guidance for choosing an optimal initial lr.\n",
"\n",
"\n",
"warning: For the moment, this feature only works with models having a single optimizer. LR support for DDP is not implemented yet, it is coming soon.\n",
"\n",
"\n",
"***auto_lr_find=***\n",
"\n",
"In the most basic use case, this feature can be enabled during trainer construction with Trainer(auto_lr_find=True).\n",
"When .fit(model) is called, the LR finder will automatically run before any training is done. The lr that is found and used will be written to the console and logged together with all other hyperparameters of the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iuhve9RBOfFh"
},
"outputs": [],
"source": [
"# default used by the Trainer (no learning rate finder)\n",
"Under the hood, when you call tune it runs the learning rate finder.\n",
"\n",
"If you want to inspect the results of the learning rate finder before doing any actual training or just play around with the parameters of the algorithm, this can be done by invoking the lr_find method of the trainer. A typical example of this would look like\n",
"\n",
"\n",
"```\n",
"trainer = pl.Trainer(auto_lr_find=True)\n",
"\n",
"# Run learning rate finder\n",
"lr_finder = trainer.lr_find(model)\n",
"\n",
"# Results can be found in\n",
"lr_finder.results\n",
"\n",
"# Plot with\n",
"fig = lr_finder.plot(suggest=True)\n",
"fig.show()\n",
"\n",
"# Pick point based on plot, or get suggestion\n",
"new_lr = lr_finder.suggestion()\n",
"\n",
"# update hparams of the model\n",
"model.hparams.lr = new_lr\n",
"\n",
"# Fit model\n",
"trainer.fit(model)\n",
"```\n",
"\n",
"The figure produced by lr_finder.plot() should look something like the figure below. It is recommended to not pick the learning rate that achieves the lowest loss, but instead something in the middle of the sharpest downward slope (red point). This is the point returned py lr_finder.suggestion().\n",
"You can try to speed your system by setting `benchmark=True`, which enables cudnn.benchmark. This flag is likely to increase the speed of your system if your input sizes don’t change. This flag makes cudnn auto-tuner look for the optimal set of algorithms for the given hardware configuration. This usually leads to faster runtime.\n",
"But if your input sizes changes at each iteration, then cudnn will benchmark every time a new size appears, possibly leading to worse runtime performances."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dWr-OCBgQCeb"
},
"outputs": [],
"source": [
"trainer = pl.Trainer(gpus=1, benchmark=True)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qwAvSKYGa24K"
},
"source": [
"# `deterministic`\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tl5mfmafwmat"
},
"source": [
"PyTorch does not guarantee reproducible results, even when using identical seeds. To guarentee reproducible results, you can remove most of the randomness from your process by setting the `deterministic` flag to True.\n",
"You can debug your grad norm to identify exploding or vanishing gradients using the `track_grad_norm` flag.\n",
"\n",
"Set value to 2 to track the 2-norm. or p to any p-norm."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2taHUir8rflR"
},
"outputs": [],
"source": [
"# track the 2-norm\n",
"trainer = pl.Trainer(track_grad_norm=2)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3vHKxmruk62f"
},
"source": [
"May be set to ‘inf’ infinity-norm."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "g7TbD6SxlAjP"
},
"outputs": [],
"source": [
"trainer = pl.Trainer(track_grad_norm='inf')\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TcMlRe7ywpe6"
},
"source": [
"## Gradient clipping\n",
"\n",
"\n",
"Exploding gradients refer to the problem that the gradients get too large and overflow in training, making the model unstable. Gradient clipping will ‘clip’ the gradients or cap them to a Threshold value to prevent the gradients from getting too large. To avoid this, we can set `gradient_clip_val` (default is set to 0.0).\n",
"\n",
"[when to use it, what are relevant values]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jF9JwmbOgOWF"
},
"outputs": [],
"source": [
"trainer = pl.Trainer(gradient_clip_val=0.1)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ggb4MkkQrr1h"
},
"source": [
"# truncated_bptt_steps\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s1Iu6PyAw9_r"
},
"source": [
"If you have a large recurrent model, you can use truncated_bptt_steps flag to split up the backprop over portions of the sequence. This flag will automatically truncate your batches and the trainer will apply Truncated Backprop to it.\n",
"\n",
"Make sure your batches have a sequence dimension.\n",
"\n",
"Lightning takes care of splitting your batch along the time-dimension.\n",
"```\n",
"# we use the second as the time dimension\n",
"# (batch, time, ...)\n",
"sub_batch = batch[0, 0:t, ...]\n",
"Using this feature requires updating your LightningModule’s pytorch_lightning.core.LightningModule.training_step() to include a hiddens arg with the hidden\n",
"Lightning Callbacks are self-contained programs that can be reused across projects.\n",
"Callbacks should capture NON-ESSENTIAL logic that is NOT required for your LightningModule to run. Lightning includes some a few built-in callbacks that can be used with flags like early stopping and Model Checkpointing, but you can also create your own callbacks to add any functionality to your models.\n",
"\n",
"The callback API includes hooks that allow you to add logic at every point of your training:\n",
"setup, teardown, on_epoch_start, on_epoch_end, on_batch_start, on_batch_end, on_init_start, on_keyboard_interrupt etc. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1t84gvDNsUuh"
},
"source": [
"## callbacks\n",
"\n",
"Use **callbacks=** to pass a list of user defined callbacks. These callbacks DO NOT replace the built-in callbacks (loggers or EarlyStopping). \n",
"\n",
"In this example, we create a dummy callback that prints a message when training starts and ends, using on_train_start and on_train_end hooks."
"Checkpoints capture the exact value of all parameters used by a model.\n",
"\n",
"Checkpointing your training allows you to resume a training process in case it was interrupted, fine-tune a model or use a pre-trained model for inference without having to retrain the model.\n",
"\n",
"Lightning automates saving and loading checkpoints so you restore a training session, saving all the required parameters including: \n",
"* 16-bit scaling factor (apex)\n",
"* Current epoch\n",
"* Global step\n",
"* Model state_dict\n",
"* State of all optimizers\n",
"* State of all learningRate schedulers\n",
"* State of all callbacks\n",
"* The hyperparameters used for that model if passed in as hparams (Argparse.Namespace)\n",
"\n",
"By default Lightning will save a checkpoint in the working directory, which will be updated every epoch.\n",
"\n",
"### Automatic saving\n",
"By default Lightning will save a checkpoint in the end of the first epoch in the working directory, which will be updated every epoch."
"You can also have Lightning update your checkpoint based on a specific metric that you are logging (using self.log), by passing the key to `monitor=`. For example, if we want to save checkpoint based on the validation loss, logged as `val_loss`, you can pass:\n",
"## Restoring Training State (resume_from_checkpoint)\n",
"If your training was cut short for some reason, you can resume exactly from where you left off using the `resume_from_checkpoint` flag, which will automatically restore model, epoch, step, LR schedulers, apex, etc..."
"The EarlyStopping callback can be used to monitor a validation metric and stop the training when no improvement is observed, to help you avoid overfitting.\n",
"\n",
"To enable Early Stopping you can init the EarlyStopping callback, and pass it to `callbacks=` trainer flag. The callback will look for a logged metric to early stop on.\n",
"The EarlyStopping callback runs at the end of every validation epoch, which, under the default configuration, happens after every training epoch. However, the frequency of validation can be modified by setting various parameters on the Trainer, for example check_val_every_n_epoch and val_check_interval. It must be noted that the patience parameter counts the number of validation epochs with no improvement, and not the number of training epochs. Therefore, with parameters check_val_every_n_epoch=10 and patience=3, the trainer will perform at least 40 training epochs before being stopped."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VoKrX2ENh9Fg"
},
"source": [
"# Logging"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-CQTPKd7iKLm"
},
"source": [
"Lightning has built in integration with various loggers such as TensorBoard, wandb, commet, etc.\n",
"\n",
"\n",
"You can pass any metrics you want to log during training to `self.log`, such as loss or accuracy. Similarly, pass in to self.log any metric you want to log during validation step.\n",
"\n",
"These values will be passed in to the logger of your choise. simply pass in any supported logger to logger trainer flag.\n",
"\n",
"\n",
"\n",
"Use the as`logger=` trainer flag to pass in a Logger, or iterable collection of Loggers, for experiment tracking.\n",
"How often to add logging rows (does not write to disk)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HkqD7D_0w1Tt"
},
"outputs": [],
"source": [
"trainer = pl.Trainer(log_every_n_steps=1000)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9uw0gfe422CT"
},
"source": [
"# info logging"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dQXpt0aatDGo"
},
"source": [
"### default_root_dir\n",
"\n",
"---\n",
"\n",
"\n",
"\n",
"Default path for logs and weights when no logger or pytorch_lightning.callbacks.ModelCheckpoint callback passed. On certain clusters you might want to separate where logs and checkpoints are stored. If you don’t then use this argument for convenience. Paths can be local paths or remote paths such as s3://bucket/path or ‘hdfs://path/’. Credentials will need to be set up to use remote filepaths."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CMmID2Bts5W3"
},
"source": [
"## weights_summary\n",
"Prints a summary of the weights when training begins. Default is set to `top`- print summary of top level modules.\n",
"\n",
"Options: ‘full’, ‘top’, None."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KTl6EdwDs6j2"
},
"outputs": [],
"source": [
"\n",
"# print full summary of all modules and submodules\n",
"trainer = pl.Trainer(weights_summary='full')\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "R57cSLl9w9ma"
},
"outputs": [],
"source": [
"# don't print a summary\n",
"trainer = Trainer(weights_summary=None)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bSc2hU5AotAP"
},
"source": [
"# progress bar"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GgvbyDsBxcH6"
},
"source": [
"## process_position\n",
"\n",
"Orders the progress bar. Useful when running multiple trainers on the same node.\n",
"\n",
"(This argument is ignored if a custom callback is passed to callbacks)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6ekz8Es8owDn"
},
"outputs": [],
"source": [
"# default used by the Trainer\n",
"trainer = pl.Trainer(process_position=0)\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "itivQFgEphBU"
},
"source": [
"## progress_bar_refresh_rate\n",
"\n",
"How often to refresh the progress bar (in steps). In notebooks, faster refresh rates (lower number) is known to crash them because of their screen refresh rates, so raise it to 50 or more."
"You can also use Lightning AdvancedProfiler if you want more detailed information about time spent in each function call recorded during a given action. The output is quite verbose and you should only use this if you want very detailed reports.\n",
" <h1> <strong> Congratulations - Time to Join the Community! </strong> </h1>\n",
"</code>\n",
"\n",
"Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the Lightning movement, you can do so in the following ways!\n",
"\n",
"### Star [Lightning](https://github.com/PyTorchLightning/pytorch-lightning) on GitHub\n",
"The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool tools we're building.\n",
"\n",
"* Please, star [Lightning](https://github.com/PyTorchLightning/pytorch-lightning)\n",
"The best way to keep up to date on the latest advancements is to join our community! Make sure to introduce yourself and share your interests in `#general` channel\n",
"\n",
"### Interested by SOTA AI models ! Check out [Bolt](https://github.com/PyTorchLightning/pytorch-lightning-bolts)\n",
"Bolts has a collection of state-of-the-art models, all implemented in [Lightning](https://github.com/PyTorchLightning/pytorch-lightning) and can be easily integrated within your own projects.\n",
"\n",
"* Please, star [Bolt](https://github.com/PyTorchLightning/pytorch-lightning-bolts)\n",
"\n",
"### Contributions !\n",
"The best way to contribute to our community is to become a code contributor! At any time you can go to [Lightning](https://github.com/PyTorchLightning/pytorch-lightning) or [Bolt](https://github.com/PyTorchLightning/pytorch-lightning-bolts) GitHub Issues page and filter for \"good first issue\". \n",
"\n",
"* [Lightning good first issue](https://github.com/PyTorchLightning/pytorch-lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n",
"* [Bolt good first issue](https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)\n",
"* You can also contribute your own notebooks with useful examples !\n",
"\n",
"### Great thanks from the entire Pytorch Lightning Team for your interest !\n",