Atharva Phatak
cdb7006b98
Fix ddp_spawn -> ddp fallback logic when on LSF cluster ( #15657 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-12 17:26:16 +00:00
Adrian Wälchli
18288eb3f3
Checkpoint migration for `ModelCheckpoint` state-key changes ( #15606 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-11-11 13:06:25 +00:00
Adrian Wälchli
75b5042081
Validate that state-key is unique when using multiple callbacks of the same type ( #15634 )
2022-11-11 05:15:03 -05:00
Rohit Gupta
f4ca5623d2
Make checkpointing on train epoch end condition dynamic ( #15300 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-11-09 14:27:53 +00:00
Yuxuan Lu
ee8a57da0f
Fix usage of fs.listdir in CheckpointConnector ( #15413 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-11-04 20:21:52 +00:00
Adrian Wälchli
38a9e69543
Extend the detection of interactive mode ( #15293 )
...
* extend interactive mode detection
* update test names
* changelog
* test
2022-10-26 15:24:11 +00:00
Adrian Wälchli
576757fd79
Validate SRUN variables when launching in SLURM ( #15011 )
2022-10-19 21:42:11 +00:00
Carlos Mocholí
24c26f7db2
Standardize Lite's filenames ( #15058 )
2022-10-19 14:09:41 +02:00
Rohit Gupta
eb17dc9839
Deprecate tuning enum and trainer properties ( #15100 )
2022-10-13 13:29:50 +00:00
Max Ehrlich
5a3007cd6c
Support Slurm Autorequeue for Array Jobs ( #15040 )
...
Signed-off-by: Max Ehrlich <max.ehr@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-10-10 13:43:57 +02:00
Adrian Wälchli
c76a95ea12
More tests for TPU accelerator in Lite ( #14960 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-08 15:42:21 +00:00
Carlos Mocholí
7ef87464dd
Refactor XLA and TPU checks across codebase ( #14550 )
2022-10-04 22:54:14 +00:00
otaj
5f0c4aad12
Introduce `ckpt_path="hpc"` keyword for checkpoint loading ( #14911 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 12:45:51 +00:00
Rohit Gupta
d1a3a3ebf5
Add BatchSizeFinder callback ( #11089 )
...
* add BatchSizeFinderCallback callback
* temp rm from init
* skip with lr_finder tests
* restore loops and intergrate early exit
* enable fast_dev_run test
* add docs and tests
* keep tune and remove early_exit
* add more tests
* patch lr finder
* disable skip
* force_save and fix test
* mypy and circular import fix
* fix mypy
* fix
* updates
* rebase
* address reviews
* add more exceptions for unsupported functionalities
* move exception to setup
* chlog
* unit test
* address reviews
* Apply suggestions from code review
* update
* update
* mypy
* fix
* use it as a util func
* license
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* mypy
* mypy
* review
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* updates
* updates
* fix import
* Protect callback attrs
* don't reset val dataloader
* update test
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-27 08:54:37 -04:00
Adrian Wälchli
dc1dc0df36
Attempt to query device count via NVML ( #14631 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
Carlos Mocholí
e9c571d39f
Move accelerator-specific parsing functions with their accelerators ( #14753 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-18 22:48:45 +00:00
Adrian Wälchli
35c65b0287
Fix test suite when running on MPS-enabled hardware ( #14708 )
2022-09-16 19:21:36 +00:00
Adrian Wälchli
47f0d336f1
Standalone Lite: Update LightningLite ( #14726 )
2022-09-16 17:25:27 +00:00
Adrian Wälchli
619e76f22d
Remove silent behavior when `num_slurm_tasks` does not correspond to number of processes in Trainer ( #14300 )
...
* simplify logic
* remove hpc
* update
* add changelog
* more tests
* update test
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-16 11:00:09 +00:00
Adrian Wälchli
19a1274093
Better error message when dataloader and datamodule is None (V2) ( #14637 )
2022-09-13 12:26:03 +00:00
Max Ehrlich
e5998e6bf2
Make the SLURM Preemption/Timeout Signal Configurable ( #14626 )
...
* Add parameter to change the preemption signal
* Make the signal connector use the custom signal from SLURMEnvironment
Signed-off-by: Max Ehrlich <max.ehr@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-12 19:24:35 +00:00
Adrian Wälchli
d013bcc5bf
Standalone Lite: Accelerators ( #14578 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 16:00:14 +00:00
Adrian Wälchli
024e7b8204
Standalone Lite: Cluster Environments ( #14509 )
2022-09-12 12:20:08 +02:00
Adrian Wälchli
d2459df2ff
Standalone Lite: Remaining Utilities ( #14492 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Felonious-Spellfire <felonious.spellfire@gmail.com>
2022-09-07 15:25:23 +00:00
Adrian Wälchli
250c06e406
Remove deprecated HPC model hooks ( #14315 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 20:59:32 +00:00
Adrian Wälchli
fafd254678
Fix device parser logic to avoid creating CUDA context ( #14319 )
...
* let environment disable forking
* add helper function and error messages
* tests
* changelog
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 15:41:38 +00:00
Rohit Gupta
c8e22b4572
Avoid raising the sampler warning if num_replicas=1 ( #14097 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-08-12 08:44:21 +00:00
Adrian Wälchli
807f9d8c96
Replace unwrapping logic in strategies ( #13738 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-12 08:24:04 +00:00
Rohit Gupta
2d9e00fab6
Profile batch transfer and gradient clipping hooks ( #14069 )
2022-08-11 23:21:53 +00:00
Carlos Mocholí
3dc08b1ef5
Fix flaky test caused by weak reference ( #14157 )
2022-08-11 09:33:19 +02:00
Adrian Wälchli
a7cebf2416
Fix entry point test for Python 3.10 ( #14154 )
2022-08-11 01:32:32 +02:00
Rohit Gupta
a4e4cab7a6
Deprecate `amp_level` from `Trainer` ( #13898 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-05 08:31:19 +00:00
Adrian Wälchli
e6a8283e9c
Organize accelerator tests ( #13986 )
2022-08-03 13:49:55 +00:00
Rohit Gupta
c67b075cf5
Use `global_step` while restoring logging step for old checkpoints ( #13645 )
...
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-07-19 18:53:22 +00:00
otaj
33bd270845
Adds Sampler Wrappers for custom samplers in distributed environment ( #12959 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-06-22 12:17:53 +02:00
Jirka Borovec
ab59f308b1
Future 4/n: test & legacy in test/ folder ( #13295 )
...
* move: legacy >> test/
* move: tests >> test/
* rename unittests
* update CI
* tests4pl
* tests_pytorch
* proxi
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci
* link
* cli
* standalone
* fixing
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* .
* Apply suggestions from code review
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* alone
* test -> tests
* Standalone fixes
* ci
* Update
* More fixes
* Fix coverage
* Fix mypy
* mypy
* Empty-Commit
* Fix
* mypy just for pl
* Fix standalone
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-06-15 18:10:49 -04:00