simplify examples structure (#1247)
* simplify examples structure * update changelog * fix imports * rename example * rename scripts * changelog
This commit is contained in:
parent
16f4cc9ff0
commit
22bedf9b57
|
@ -8,7 +8,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
|
|||
|
||||
### Added
|
||||
|
||||
- Added parity test between a vanilla MNIST model and lightning model ([#1284](https://github.com/PyTorchLightning/pytorch-lightning/pull/1284))
|
||||
- Added parity test between a vanilla MNIST model and lightning model ([#1284](https://github.com/PyTorchLightning/pytorch-lightning/pull/1284))
|
||||
- Added parity test between a vanilla RNN model and lightning model ([#1351](https://github.com/PyTorchLightning/pytorch-lightning/pull/1351))
|
||||
- Added Reinforcement Learning - Deep Q-network (DQN) lightning example ([#1232](https://github.com/PyTorchLightning/pytorch-lightning/pull/1232))
|
||||
- Added support for hierarchical `dict` ([#1152](https://github.com/PyTorchLightning/pytorch-lightning/pull/1152))
|
||||
- Added `TrainsLogger` class ([#1122](https://github.com/PyTorchLightning/pytorch-lightning/pull/1122))
|
||||
|
@ -40,6 +41,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
|
|||
- Give warnings for unimplemented required lightning methods ([#1317](https://github.com/PyTorchLightning/pytorch-lightning/pull/1317))
|
||||
- Enhanced load_from_checkpoint to also forward params to the model ([#1307](https://github.com/PyTorchLightning/pytorch-lightning/pull/1307))
|
||||
- Made `evaluate` method private >> `Trainer._evaluate(...)`. ([#1260](https://github.com/PyTorchLightning/pytorch-lightning/pull/1260))
|
||||
- Simplify the PL examples structure (shallower and more readable) ([#1247](https://github.com/PyTorchLightning/pytorch-lightning/pull/1247))
|
||||
|
||||
### Deprecated
|
||||
|
||||
|
|
|
@ -1,14 +1,67 @@
|
|||
# Examples
|
||||
This folder has 4 sections:
|
||||
This folder has 3 sections:
|
||||
|
||||
### Basic examples
|
||||
These show the most common use of Lightning for either CPU or GPU training.
|
||||
## Basic Examples
|
||||
Use these examples to test how lightning works.
|
||||
|
||||
### Domain templates
|
||||
These are templates to show common approaches such as GANs and RL.
|
||||
#### Test on CPU
|
||||
```bash
|
||||
python cpu_template.py
|
||||
```
|
||||
|
||||
### Full examples
|
||||
Contains examples demonstrating ImageNet training, Semantic Segmentation, etc.
|
||||
---
|
||||
#### Train on a single GPU
|
||||
```bash
|
||||
python gpu_template.py --gpus 1
|
||||
```
|
||||
|
||||
### Multi-node examples
|
||||
These show how to run jobs on a GPU cluster using lightning.
|
||||
---
|
||||
#### DataParallel (dp)
|
||||
Train on multiple GPUs using DataParallel.
|
||||
|
||||
```bash
|
||||
python gpu_template.py --gpus 2 --distributed_backend dp
|
||||
```
|
||||
|
||||
---
|
||||
#### DistributedDataParallel (ddp)
|
||||
|
||||
Train on multiple GPUs using DistributedDataParallel
|
||||
```bash
|
||||
python gpu_template.py --gpus 2 --distributed_backend ddp
|
||||
```
|
||||
|
||||
---
|
||||
#### DistributedDataParallel+DP (ddp2)
|
||||
|
||||
Train on multiple GPUs using DistributedDataParallel + dataparallel.
|
||||
On a single node, uses all GPUs for 1 model. Then shares gradient information
|
||||
across nodes.
|
||||
```bash
|
||||
python gpu_template.py --gpus 2 --distributed_backend ddp2
|
||||
```
|
||||
|
||||
## Multi-node example
|
||||
|
||||
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
|
||||
To run this demo do the following:
|
||||
|
||||
1. Log into the jumphost node of your SLURM-managed cluster.
|
||||
2. Create a conda environment with Lightning and a GPU PyTorch version.
|
||||
3. Choose a script to submit
|
||||
|
||||
### DDP
|
||||
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
|
||||
```bash
|
||||
sbatch ddp_job_submit.sh YourEnv
|
||||
```
|
||||
|
||||
### DDP2
|
||||
Submit this job to run with a different implementation of DistributedDataParallel.
|
||||
In this version, each node acts like DataParallel but syncs across nodes like DDP.
|
||||
```bash
|
||||
sbatch ddp2_job_submit.sh YourEnv
|
||||
```
|
||||
|
||||
## Domain templates
|
||||
These are templates to show common approaches such as GANs and RL.
|
||||
|
|
|
@ -140,7 +140,7 @@ Hyperparameter search on a SLURM HPC cluster
|
|||
|
||||
"""
|
||||
|
||||
from .basic_examples.lightning_module_template import LightningTemplateModel
|
||||
from pl_examples.models.lightning_template import LightningTemplateModel
|
||||
|
||||
__all__ = [
|
||||
'LightningTemplateModel'
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Basic Examples
|
||||
## Basic Examples
|
||||
Use these examples to test how lightning works.
|
||||
|
||||
#### Test on CPU
|
||||
|
@ -36,4 +36,27 @@ On a single node, uses all GPUs for 1 model. Then shares gradient information
|
|||
across nodes.
|
||||
```bash
|
||||
python gpu_template.py --gpus 2 --distributed_backend ddp2
|
||||
```
|
||||
```
|
||||
|
||||
|
||||
# Multi-node example
|
||||
|
||||
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
|
||||
To run this demo do the following:
|
||||
|
||||
1. Log into the jumphost node of your SLURM-managed cluster.
|
||||
2. Create a conda environment with Lightning and a GPU PyTorch version.
|
||||
3. Choose a script to submit
|
||||
|
||||
#### DDP
|
||||
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
|
||||
```bash
|
||||
sbatch ddp_job_submit.sh YourEnv
|
||||
```
|
||||
|
||||
#### DDP2
|
||||
Submit this job to run with a different implementation of DistributedDataParallel.
|
||||
In this version, each node acts like DataParallel but syncs across nodes like DDP.
|
||||
```bash
|
||||
sbatch ddp2_job_submit.sh YourEnv
|
||||
```
|
||||
|
|
|
@ -8,7 +8,7 @@ import numpy as np
|
|||
import torch
|
||||
|
||||
import pytorch_lightning as pl
|
||||
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
|
||||
from pl_examples.models.lightning_template import LightningTemplateModel
|
||||
|
||||
SEED = 2334
|
||||
torch.manual_seed(SEED)
|
||||
|
|
|
@ -8,7 +8,7 @@ import numpy as np
|
|||
import torch
|
||||
|
||||
import pytorch_lightning as pl
|
||||
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
|
||||
from pl_examples.models.lightning_template import LightningTemplateModel
|
||||
|
||||
SEED = 2334
|
||||
torch.manual_seed(SEED)
|
||||
|
|
|
@ -8,7 +8,7 @@ import numpy as np
|
|||
import torch
|
||||
|
||||
import pytorch_lightning as pl
|
||||
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
|
||||
from pl_examples.models.lightning_template import LightningTemplateModel
|
||||
|
||||
SEED = 2334
|
||||
torch.manual_seed(SEED)
|
|
@ -8,7 +8,7 @@ import numpy as np
|
|||
import torch
|
||||
|
||||
import pytorch_lightning as pl
|
||||
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
|
||||
from pl_examples.models.lightning_template import LightningTemplateModel
|
||||
|
||||
SEED = 2334
|
||||
torch.manual_seed(SEED)
|
|
@ -1,6 +1,6 @@
|
|||
"""
|
||||
To run this template just do:
|
||||
python gan.py
|
||||
python generative_adversarial_net.py
|
||||
|
||||
After a few epochs, launch TensorBoard to see the images being generated at every batch:
|
||||
|
|
@ -6,10 +6,10 @@ import torch
|
|||
import torch.nn.functional as F
|
||||
import torchvision.transforms as transforms
|
||||
from PIL import Image
|
||||
from models.unet.model import UNet
|
||||
from torch.utils.data import DataLoader, Dataset
|
||||
|
||||
import pytorch_lightning as pl
|
||||
from pl_examples.models.unet import UNet
|
||||
|
||||
|
||||
class KITTI(Dataset):
|
|
@ -1,44 +0,0 @@
|
|||
import torch.nn as nn
|
||||
|
||||
from models.unet.parts import DoubleConv, Down, Up
|
||||
|
||||
|
||||
class UNet(nn.Module):
|
||||
"""
|
||||
Architecture based on U-Net: Convolutional Networks for Biomedical Image Segmentation
|
||||
Link - https://arxiv.org/abs/1505.04597
|
||||
|
||||
Parameters:
|
||||
num_classes (int): Number of output classes required (default 19 for KITTI dataset)
|
||||
bilinear (bool): Whether to use bilinear interpolation or transposed
|
||||
convolutions for upsampling.
|
||||
"""
|
||||
|
||||
def __init__(self, num_classes=19, bilinear=False):
|
||||
super().__init__()
|
||||
self.layer1 = DoubleConv(3, 64)
|
||||
self.layer2 = Down(64, 128)
|
||||
self.layer3 = Down(128, 256)
|
||||
self.layer4 = Down(256, 512)
|
||||
self.layer5 = Down(512, 1024)
|
||||
|
||||
self.layer6 = Up(1024, 512, bilinear=bilinear)
|
||||
self.layer7 = Up(512, 256, bilinear=bilinear)
|
||||
self.layer8 = Up(256, 128, bilinear=bilinear)
|
||||
self.layer9 = Up(128, 64, bilinear=bilinear)
|
||||
|
||||
self.layer10 = nn.Conv2d(64, num_classes, kernel_size=1)
|
||||
|
||||
def forward(self, x):
|
||||
x1 = self.layer1(x)
|
||||
x2 = self.layer2(x1)
|
||||
x3 = self.layer3(x2)
|
||||
x4 = self.layer4(x3)
|
||||
x5 = self.layer5(x4)
|
||||
|
||||
x6 = self.layer6(x5, x4)
|
||||
x6 = self.layer7(x6, x3)
|
||||
x6 = self.layer8(x6, x2)
|
||||
x6 = self.layer9(x6, x1)
|
||||
|
||||
return self.layer10(x6)
|
|
@ -3,6 +3,47 @@ import torch.nn as nn
|
|||
import torch.nn.functional as F
|
||||
|
||||
|
||||
class UNet(nn.Module):
|
||||
"""
|
||||
Architecture based on U-Net: Convolutional Networks for Biomedical Image Segmentation
|
||||
Link - https://arxiv.org/abs/1505.04597
|
||||
|
||||
Parameters:
|
||||
num_classes (int): Number of output classes required (default 19 for KITTI dataset)
|
||||
bilinear (bool): Whether to use bilinear interpolation or transposed
|
||||
convolutions for upsampling.
|
||||
"""
|
||||
|
||||
def __init__(self, num_classes=19, bilinear=False):
|
||||
super().__init__()
|
||||
self.layer1 = DoubleConv(3, 64)
|
||||
self.layer2 = Down(64, 128)
|
||||
self.layer3 = Down(128, 256)
|
||||
self.layer4 = Down(256, 512)
|
||||
self.layer5 = Down(512, 1024)
|
||||
|
||||
self.layer6 = Up(1024, 512, bilinear=bilinear)
|
||||
self.layer7 = Up(512, 256, bilinear=bilinear)
|
||||
self.layer8 = Up(256, 128, bilinear=bilinear)
|
||||
self.layer9 = Up(128, 64, bilinear=bilinear)
|
||||
|
||||
self.layer10 = nn.Conv2d(64, num_classes, kernel_size=1)
|
||||
|
||||
def forward(self, x):
|
||||
x1 = self.layer1(x)
|
||||
x2 = self.layer2(x1)
|
||||
x3 = self.layer3(x2)
|
||||
x4 = self.layer4(x3)
|
||||
x5 = self.layer5(x4)
|
||||
|
||||
x6 = self.layer6(x5, x4)
|
||||
x6 = self.layer7(x6, x3)
|
||||
x6 = self.layer8(x6, x2)
|
||||
x6 = self.layer9(x6, x1)
|
||||
|
||||
return self.layer10(x6)
|
||||
|
||||
|
||||
class DoubleConv(nn.Module):
|
||||
"""
|
||||
Double Convolution and BN and ReLU
|
|
@ -1,21 +0,0 @@
|
|||
# Multi-node example
|
||||
|
||||
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
|
||||
To run this demo do the following:
|
||||
|
||||
1. Log into the jumphost node of your SLURM-managed cluster.
|
||||
2. Create a conda environment with Lightning and a GPU PyTorch version.
|
||||
3. Choose a script to submit
|
||||
|
||||
#### DDP
|
||||
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
|
||||
```bash
|
||||
sbatch ddp_job_submit.sh YourEnv
|
||||
```
|
||||
|
||||
#### DDP2
|
||||
Submit this job to run with a different implementation of DistributedDataParallel.
|
||||
In this version, each node acts like DataParallel but syncs across nodes like DDP.
|
||||
```bash
|
||||
sbatch ddp2_job_submit.sh YourEnv
|
||||
```
|
Loading…
Reference in New Issue