changed read me
This commit is contained in:
parent
3f684858f2
commit
ac11d37b5b
|
@ -16,41 +16,62 @@
|
|||
- Multi-gpu example
|
||||
- SLURM cluster grid search example
|
||||
|
||||
###### Training loop
|
||||
- Accumulate gradients
|
||||
- Check GPU usage
|
||||
- Check which gradients are nan
|
||||
- Check validation every n epochs
|
||||
- Display metrics in progress bar
|
||||
- Force training for min or max epochs
|
||||
- Inspect gradient norms
|
||||
- Hooks
|
||||
- Learning rate annealing
|
||||
- Make model overfit on subset of data
|
||||
- Multiple optimizers (like GANs)
|
||||
- Set how much of the training set to check (1-100%)
|
||||
- training_step function
|
||||
|
||||
###### Validation loop
|
||||
- Display metrics in progress bar
|
||||
- hooks
|
||||
- Set how much of the validation set to check (1-100%)
|
||||
- Set validation check frequency within 1 training epoch (1-100%)
|
||||
- validation_step function
|
||||
- Why does validation run first for 5 steps?
|
||||
###### Checkpointing
|
||||
|
||||
###### Distributed training
|
||||
- Single-gpu
|
||||
- Multi-gpu
|
||||
- Multi-node
|
||||
- 16-bit mixed precision
|
||||
|
||||
###### Checkpointing
|
||||
- Model saving
|
||||
- Model loading
|
||||
|
||||
###### Computing cluster (SLURM)
|
||||
######Computing cluster (SLURM)
|
||||
|
||||
- Automatic checkpointing
|
||||
- Automatic saving, loading
|
||||
- Running grid search on a cluster
|
||||
- Walltime auto-resubmit
|
||||
|
||||
######Debugging
|
||||
|
||||
- [Fast dev run](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#fast-dev-run)
|
||||
- [Inspect gradient norms](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#inspect-gradient-norms)
|
||||
- [Log GPU usage](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#Log-gpu-usage)
|
||||
- [Make model overfit on subset of data](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#make-model-overfit-on-subset-of-data)
|
||||
- [Print the parameter count by layer](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#print-the-parameter-count-by-layer)
|
||||
- [Pring which gradients are nan](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#print-which-gradients-are-nan)
|
||||
|
||||
|
||||
######Distributed training
|
||||
|
||||
- [16-bit mixed precision](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#16-bit-mixed-precision)
|
||||
- [Multi-GPU](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#Multi-GPU)
|
||||
- [Multi-node](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#Multi-node)
|
||||
- [Single GPU](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#single-gpu)
|
||||
- [Self-balancing architecture](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#self-balancing-architecture)
|
||||
|
||||
|
||||
######Experiment Logging
|
||||
|
||||
- [Display metrics in progress bar](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#display-metrics-in-progress-bar)
|
||||
- Log arbitrary metrics
|
||||
- [Log metric row every k batches](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#log-metric-row-every-k-batches)
|
||||
- [Process position](Logging/#process-position)
|
||||
- [Save a snapshot of all hyperparameters](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#save-a-snapshot-of-all-hyperparameters)
|
||||
- [Snapshot code for a training run](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#snapshot-code-for-a-training-run)
|
||||
- [Write logs file to csv every k batches](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#write-logs-file-to-csv-every-k-batches)
|
||||
|
||||
######Training loop
|
||||
|
||||
- [Accumulate gradients](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#accumulated-gradients)
|
||||
- [Anneal Learning rate](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#anneal-learning-rate)
|
||||
- [Force training for min or max epochs](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#force-training-for-min-or-max-epochs)
|
||||
- [Force disable early stop](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#force-disable-early-stop)
|
||||
- [Use multiple optimizers (like GANs)](https://williamfalcon.github.io/pytorch-lightning/Pytorch-Lightning/LightningModule/#configure_optimizers)
|
||||
- [Set how much of the training set to check (1-100%)](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#set-how-much-of-the-training-set-to-check)
|
||||
|
||||
######Validation loop
|
||||
|
||||
- [Check validation every n epochs](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#check-validation-every-n-epochs)
|
||||
- [Set how much of the validation set to check](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-how-much-of-the-validation-set-to-check)
|
||||
- [Set how much of the test set to check](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-how-much-of-the-test-set-to-check)
|
||||
- [Set validation check frequency within 1 training epoch](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-validation-check-frequency-within-1-training-epoch)
|
||||
- [Set the number of validation sanity steps](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-the-number-of-validation-sanity-steps)
|
||||
|
||||
|
|
Loading…
Reference in New Issue