32e74b8f36
* adds ddp2 option where on each node a single process uses all gpus * added ddp2 test * added ddp2 docs * Update Distributed training.md * delete ref to old update_training_log_metrics * delete ref to old update_training_log_metrics * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * banana pancakes * debug * debug * debug * debug * debug * debug * debug * debug * cheesecake |
||
---|---|---|
.. | ||
Checkpointing.md | ||
Distributed training.md | ||
Logging.md | ||
SLURM Managed Cluster.md | ||
Testing loop.md | ||
Training Loop.md | ||
Validation loop.md | ||
debugging.md | ||
hooks.md | ||
index.md |