Commit Graph

6 Commits

Author SHA1 Message Date
edenlightning 1c196da309
Update fault_tolerant_training_basic.rst (#16012) 2022-12-22 07:16:02 +00:00
Adrian Wälchli 7a1e0e801e
Fix typo in definition of world size in docs (#15954) 2022-12-08 18:06:12 +00:00
Adrian Wälchli ff3c5b7b9d
Docs section for SLURM troubleshooting (#14873)
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-29 12:41:31 +00:00
Max Ehrlich e5998e6bf2
Make the SLURM Preemption/Timeout Signal Configurable (#14626)
* Add parameter to change the preemption signal
* Make the signal connector use the custom signal from SLURMEnvironment

Signed-off-by: Max Ehrlich <max.ehr@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-12 19:24:35 +00:00
Rohit Gupta e21490b9bb
Update old PL links (#13349) 2022-06-21 16:38:04 +02:00
Jirka Borovec b58577fd4d
Future 3/n: docs adjustment (#13299)
* docs: rename source >> source-PL

* docs: fix typing

* readthedocs

* update paths & codeowners

* source-pytorch

* ci

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-06-15 10:54:53 -04:00