lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	02074f16c7	Fix PyTorch versions in Lite CI (#15338 ) * replace oldest in lite * Fix PyTorch versions in Lite CI * This will be moved to install pkg workflow in the mirror PR * 1.13 fixes * Windows fix * sorting Co-authored-by: otaj <ota@lightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-10-26 15:09:08 -04:00
Adrian Wälchli	38a9e69543	Extend the detection of interactive mode (#15293 ) * extend interactive mode detection * update test names * changelog * test	2022-10-26 15:24:11 +00:00
Adrian Wälchli	0f9156374d	Mark internal Lite APIs as protected (#15307 ) * mark internal lite apis as protected * formatting * docs update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-10-26 12:51:50 +00:00
otaj	76e462a0be	Do not lose references of trainer in test (#15272 ) * Fix reference error * Skip flaky hanging test * . Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-25 09:23:15 -04:00
Dan Dale	27585a9bcf	Fix and refactor `test_deepspeed_engine_is_steppable` test (#15251 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-10-23 21:25:36 +00:00
Carlos Mocholí	961e395677	Resolve collectives test issues (#15195 ) Co-authored-by: otaj <ota@lightning.ai>	2022-10-21 01:08:38 +00:00
Carlos Mocholí	b866dc3a6a	Collective's PREMUL_SUM support with PyTorch 1.13 (#15201 ) * Collective's PREMUL_SUM support with PyTorch 1.13 * Fix test * Skip under 1.13	2022-10-20 12:36:06 +00:00
Carlos Mocholí	bf458701de	Avoid underscore suffix in filenames (#15189 )	2022-10-20 07:39:19 -04:00
otaj	741462f373	[LAI] Make lite tests safe for combined package (#15204 ) Make lite tests safe for combined package Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2022-10-20 09:10:39 +00:00
Adrian Wälchli	576757fd79	Validate SRUN variables when launching in SLURM (#15011 )	2022-10-19 21:42:11 +00:00
Adrian Wälchli	045c2f5715	Efficient gradient accumulation in LightningLite (#14966 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-19 19:55:12 +00:00
Jirka Borovec	d0b092fda8	Lite: setting extras & fix CI (#15192 ) * extras * test.txt * doctest * Apply suggestions from code review * Fix imports * Oops Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-19 19:05:23 +00:00
Carlos Mocholí	24c26f7db2	Standardize Lite's filenames (#15058 )	2022-10-19 14:09:41 +02:00
Carlos Mocholí	0e18266023	Fix collective tests with PyTorch 1.13 (#15167 )	2022-10-18 14:31:48 +02:00
Justus Schock	27965cc36b	Fix locally failing lite tests (#15137 )	2022-10-18 09:49:14 +00:00
Adrian Wälchli	ed891e5049	Force NVML-based CUDA check in PyTorch 1.14+ (#15110 )	2022-10-13 13:10:29 -04:00
Carlos Mocholí	da25d1d30d	Remove unused Lite code (#15000 ) * Remove unused Lite code * Remove duplicate import * Group variable * Fix monkeypatch	2022-10-10 22:16:56 +00:00
Carlos Mocholí	c334b7766c	Remove old testing artifacts (#15052 )	2022-10-10 17:34:18 +00:00
Carlos Mocholí	d15bd1520e	[Lite] precision_plugin -> precision (#15001 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-10-10 15:00:32 +00:00
Carlos Mocholí	0b04aa879f	Resolve interactions between CUDA tests (#15042 )	2022-10-09 06:20:40 -04:00
Adrian Wälchli	c76a95ea12	More tests for TPU accelerator in Lite (#14960 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-10-08 15:42:21 +00:00
Carlos Mocholí	62ca073a41	Introduce base collective and main subclasses (#15016 ) Co-authored-by: otaj <ota@lightning.ai>	2022-10-07 19:53:19 +00:00
Dan Dale	3b75c52869	Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment (#14984 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-10-05 18:52:06 -04:00
Dan Dale	ab1eb6531e	Fix fork tests failing in environments with CUDA available (#14982 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-05 00:02:55 +00:00
Carlos Mocholí	7ef87464dd	Refactor XLA and TPU checks across codebase (#14550 )	2022-10-04 22:54:14 +00:00
Carlos Mocholí	3028fd287d	Fix TPU test CI (#14926 ) * Fix TPU test CI * +x first * Lite first to uncovert errors faster * Fixes * One more * Simplify XLALauncher wrapping to avoid pickle error * debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug commit successful. Trying local definitions * Require tpu for mock test * ValueError: The number of devices must be either 1 or 8, got 4 instead * Fix mock test * Simplify call, rely on defaults * Skip OSError for now. Maybe upgrading will help * Simplify launch tests, move some to lite * Stricter typing * RuntimeError: Accessing the XLA device before processes have spawned is not allowed. * Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed." This reverts commit `f65107ebf3`. * Alternative boring solution to the reverted commit * Fix failing test on CUDA machine * Workarounds * Try latest mkl * Revert "Try latest mkl" This reverts commit `d06813aa67`. * Wrong exception * xfail * Mypy * Comment change * Spawn launch refactor * Accept that we cannot lazy init now * Fix mypy and launch test failures * The base dockerfile already includes mkl-2022.1.0 - what if we use it? * try a different mkl version * Revert mkl version changes Co-authored-by: awaelchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-10-03 09:13:33 -04:00
Adrian Wälchli	d7af8ce2a5	Simplify root node resolution for SLURM environment (#14912 ) Co-authored-by: Seppo Enarvi <seppo.git@marjaniemi.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-09-30 15:40:43 +00:00
Adrian Wälchli	cd9247a782	Introduce primitives for input/output dtype conversion in Lite Precision (#14792 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>	2022-09-30 15:29:03 +00:00
Carlos Mocholí	6256a318d7	Refactor launching tests to use our launchers (#14954 )	2022-09-30 09:57:18 +02:00
Atharva Phatak	fdcb5cc90b	Hydra changes to lightning-lite (#14950 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2022-09-29 21:59:35 -04:00
Adrian Wälchli	498cb60417	Fairscale integration tests for Lite (#14921 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-09-29 17:46:49 +00:00
Adrian Wälchli	5b446aec4d	DeepSpeed integration tests for Lite (#14901 )	2022-09-29 16:39:32 +00:00
Adrian Wälchli	ea5e817973	Better error message when trying to re-initialize CUDA in forked subprocess (#14709 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-09-28 05:07:33 -04:00
Carlos Mocholí	9fc4ff3278	Move logic to error out on deprecation warnings into conftest (#14902 )	2022-09-27 17:49:25 +02:00
Adrian Wälchli	d572a7e2ec	Fix double precision support in Lite (#14827 )	2022-09-27 08:38:20 +00:00
Adrian Wälchli	d7404c775a	Integration tests for Precision in Lite (#14815 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2022-09-26 18:50:11 +00:00
Adrian Wälchli	dc1dc0df36	Attempt to query device count via NVML (#14631 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-09-22 09:57:13 +00:00
otaj	5ee2b86c44	Tests for fixed TypeError (#14821 ) * tests for 14809 * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-09-22 09:04:27 +02:00
Carlos Mocholí	7e803ba53e	Clean-up dtype management (#14823 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2022-09-22 00:07:36 +00:00
Adrian Wälchli	3f0fec591d	Update device attribute in Lite's module wrapper (#14822 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-09-21 19:06:10 +00:00
Carlos Mocholí	abc805f9ef	Remove the model argument from Lite's `optimizer_step` via structural typing (#14810 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2022-09-21 19:28:45 +02:00
awaelchli	c0ff7a1b77	Add backward-compatibility for LightningLite in PL (#14735 )	2022-09-20 13:31:56 +02:00
awaelchli	e3e71670e6	Move src/pytorch_lightning/lite to src/lightning_lite (#14735 )	2022-09-20 13:31:56 +02:00
Carlos Mocholí	e9c571d39f	Move accelerator-specific parsing functions with their accelerators (#14753 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2022-09-18 22:48:45 +00:00
Adrian Wälchli	1092265140	Remove check `num_slurm_tasks` in Lite (#14761 )	2022-09-18 14:01:49 -04:00
Adrian Wälchli	35c65b0287	Fix test suite when running on MPS-enabled hardware (#14708 )	2022-09-16 19:21:36 +00:00
Adrian Wälchli	47f0d336f1	Standalone Lite: Update LightningLite (#14726 )	2022-09-16 17:25:27 +00:00
Adrian Wälchli	619e76f22d	Remove silent behavior when `num_slurm_tasks` does not correspond to number of processes in Trainer (#14300 ) * simplify logic * remove hpc * update * add changelog * more tests * update test Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-09-16 11:00:09 +00:00
Adrian Wälchli	38d89713a5	Standalone Lite: Connector (#14692 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2022-09-15 14:14:51 +00:00
Adrian Wälchli	d3dcd68852	Standalone Lite: DDP Spawn Strategy Family (#14675 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-09-15 10:51:12 +00:00

1 2

68 Commits