From ac11d37b5b5cfe304bcfec7251ba546b22e65098 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Fri, 28 Jun 2019 14:53:43 -0400 Subject: [PATCH] changed read me --- docs/index.md | 79 ++++++++++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 29 deletions(-) diff --git a/docs/index.md b/docs/index.md index 74efd674d1..74a1b6b6d3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -16,41 +16,62 @@ - Multi-gpu example - SLURM cluster grid search example -###### Training loop -- Accumulate gradients -- Check GPU usage -- Check which gradients are nan -- Check validation every n epochs -- Display metrics in progress bar -- Force training for min or max epochs -- Inspect gradient norms -- Hooks -- Learning rate annealing -- Make model overfit on subset of data -- Multiple optimizers (like GANs) -- Set how much of the training set to check (1-100%) -- training_step function -###### Validation loop -- Display metrics in progress bar -- hooks -- Set how much of the validation set to check (1-100%) -- Set validation check frequency within 1 training epoch (1-100%) -- validation_step function -- Why does validation run first for 5 steps? +###### Checkpointing -###### Distributed training -- Single-gpu -- Multi-gpu -- Multi-node -- 16-bit mixed precision - -###### Checkpointing - Model saving - Model loading -###### Computing cluster (SLURM) +######Computing cluster (SLURM) + - Automatic checkpointing - Automatic saving, loading - Running grid search on a cluster - Walltime auto-resubmit + +######Debugging + +- [Fast dev run](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#fast-dev-run) +- [Inspect gradient norms](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#inspect-gradient-norms) +- [Log GPU usage](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#Log-gpu-usage) +- [Make model overfit on subset of data](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#make-model-overfit-on-subset-of-data) +- [Print the parameter count by layer](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#print-the-parameter-count-by-layer) +- [Pring which gradients are nan](https://williamfalcon.github.io/pytorch-lightning/Trainer/debugging/#print-which-gradients-are-nan) + + +######Distributed training + +- [16-bit mixed precision](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#16-bit-mixed-precision) +- [Multi-GPU](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#Multi-GPU) +- [Multi-node](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#Multi-node) +- [Single GPU](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#single-gpu) +- [Self-balancing architecture](https://williamfalcon.github.io/pytorch-lightning/Trainer/Distributed%20training/#self-balancing-architecture) + + +######Experiment Logging + +- [Display metrics in progress bar](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#display-metrics-in-progress-bar) +- Log arbitrary metrics +- [Log metric row every k batches](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#log-metric-row-every-k-batches) +- [Process position](Logging/#process-position) +- [Save a snapshot of all hyperparameters](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#save-a-snapshot-of-all-hyperparameters) +- [Snapshot code for a training run](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#snapshot-code-for-a-training-run) +- [Write logs file to csv every k batches](https://williamfalcon.github.io/pytorch-lightning/Trainer/Logging/#write-logs-file-to-csv-every-k-batches) + +######Training loop + +- [Accumulate gradients](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#accumulated-gradients) +- [Anneal Learning rate](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#anneal-learning-rate) +- [Force training for min or max epochs](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#force-training-for-min-or-max-epochs) +- [Force disable early stop](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#force-disable-early-stop) +- [Use multiple optimizers (like GANs)](https://williamfalcon.github.io/pytorch-lightning/Pytorch-Lightning/LightningModule/#configure_optimizers) +- [Set how much of the training set to check (1-100%)](https://williamfalcon.github.io/pytorch-lightning/Trainer/Training%20Loop/#set-how-much-of-the-training-set-to-check) + +######Validation loop + +- [Check validation every n epochs](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#check-validation-every-n-epochs) +- [Set how much of the validation set to check](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-how-much-of-the-validation-set-to-check) +- [Set how much of the test set to check](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-how-much-of-the-test-set-to-check) +- [Set validation check frequency within 1 training epoch](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-validation-check-frequency-within-1-training-epoch) +- [Set the number of validation sanity steps](https://williamfalcon.github.io/pytorch-lightning/Trainer/Validation%20loop/#set-the-number-of-validation-sanity-steps) +