lightning

Commit Graph

Author	SHA1	Message	Date
awaelchli	95d6b6b9da	Disable skipping training step in distributed training (#19918 )	2024-05-30 11:54:48 -04:00
awaelchli	5d7932546d	Update code owners file (#19925 ) update	2024-05-30 11:50:02 -04:00
awaelchli	014cdd84ed	Update code owners file (#19922 ) * update code owners * update * Update .github/CODEOWNERS Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> --------- Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>	2024-05-30 06:12:41 -04:00
awaelchli	98005bbed0	Add Studio badge to tensor parallel docs (#19913 )	2024-05-28 09:04:55 -04:00
awaelchli	896c2a656a	Error for unsupported precision types with ModelParallelStrategy (#19902 )	2024-05-23 13:43:46 -04:00
awaelchli	c09356db1e	(10/10) Support 2D Parallelism - Port Fabric docs to PL (#19899 )	2024-05-23 08:55:52 -04:00
awaelchli	7874cd08ec	[TPU] Fix test assertion error from artifacts (#19825 )	2024-05-23 07:11:28 -04:00
Jirka Borovec	e0d7ede643	docs: prune unused `linkcode` (#19897 )	2024-05-23 11:35:53 +02:00
awaelchli	414c86332e	(9/n) Support 2D Parallelism - Remaining Checkpoint Logic (#19888 ) Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-22 18:13:41 -04:00
Jirka Borovec	fa1126ea53	docs: fix link to CLIP (#19896 ) * docs: fix link to CLIP * www * ignore	2024-05-22 17:46:51 -04:00
awaelchli	341474aaac	(8/n) Support 2D Parallelism - 2D Parallel Fabric Docs (#19887 )	2024-05-22 13:47:55 -04:00
awaelchli	8fc7b4ae94	Remove the requirement for FSDPStrategy subclasses to only support GPU (#19894 )	2024-05-22 18:31:40 +02:00
awaelchli	987c2c4093	(7/n) Support 2D Parallelism - TP Fabric Docs (#19884 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2024-05-22 06:20:40 -04:00
awaelchli	7e87ce05c8	Fix state dict loading in bitsandbytes plugin when checkpoint is already quantized (#19886 ) * bugfix * add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * add chlog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-21 13:46:01 -04:00
Gilles Peiffer	b1bb3f3173	Update `LearningRateMonitor` docs and tests for `log_weight_decay` (#19805 )	2024-05-21 13:31:54 -04:00
awaelchli	d76feef0d6	Enable loss-parallel in example (#19882 )	2024-05-20 13:19:38 +02:00
awaelchli	82e6e61bea	Remove redundant code to set the device on the LightningModule (#19877 ) Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-20 06:29:37 +02:00
Luca Antiga	d5bf4b9ed3	[App] Extend retry to 4xx except 400, 401, 403, 404 (#19842 ) * Extend retry to 4xx except 400, 401, 403, 404 * Remove unused intersphinx mapping for app --------- Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-05-18 22:03:16 -04:00
awaelchli	c8059d7bfd	(6/n) Support 2D Parallelism - Trainer example (#19879 ) * Add 2D parallel example * replace with torchtitan code	2024-05-18 20:35:58 -04:00
awaelchli	32e241870b	(5/n) Support 2D Parallelism in Lightning Trainer (#19878 ) * ModelParallelStrategy for Lightning Trainer * mypy * import fix * fix torchscript errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs issue * fix test execution * Update src/lightning/pytorch/strategies/model_parallel.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-17 19:03:31 -04:00
awaelchli	1d0c6aae96	(4/n) Support 2D Parallelism - Loading optimizer states correctly (#19872 ) * Load optimizer state * move to utility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-17 17:17:32 -04:00
awaelchli	cd8acc26c3	(3/n) Support 2D Parallelism - Efficient loading of full-state checkpoints (#19870 ) * memory-optimized loading of full checkpoints into dist model * simplify * handle buffers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * handle strict loading, buffers, and add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chlog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-15 13:07:31 -04:00
awaelchli	9455871c93	(2/n) Support 2D Parallelism - Distributed Checkpoints (#19852 ) * distributed checkpoints * use decorator * refactor if-strict * update example * filter non-persistent buffers (todo, add test) * simplify checkpoint loading for model	2024-05-15 08:19:08 -04:00
thomas chaton	90d04b5b86	Update Lightning Cloud 0.5.69 (#19857 )	2024-05-09 16:12:30 +01:00
thomas chaton	8453e31028	Reduce queue fetching (#19856 ) * update * update	2024-05-09 07:46:27 -04:00
awaelchli	e0307277a0	Add function to explicitly mark forward methods in Fabric (#19690 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-05-08 16:58:33 -04:00
awaelchli	0c8a193d3c	(1/n) Support 2D Parallelism (#19846 )	2024-05-07 17:02:58 -04:00
Adrian Wälchli	0f12271d7f	bump lightning cloud	2024-05-01 18:45:35 -04:00
Luca Antiga	d623708192	xfail tests for deprecated functionality	2024-05-01 17:51:51 -04:00
Luca Antiga	4219f30c96	Fix formatting	2024-05-01 17:51:51 -04:00
Luca Antiga	8103bd7e01	Make sure the HTTP client for queues retries for POST and 5xx	2024-05-01 17:51:51 -04:00
Adrian Wälchli	d1949766f8	Fix TensorBoardLogger test on Windows (#19824 )	2024-04-29 08:51:56 -04:00
Adrian Wälchli	49ed2b102b	Add PyTorch 2.3 to CI matrix (#19708 )	2024-04-29 07:16:13 -04:00
Adrian Wälchli	29136332d6	Avoid interactions through test artifacts (#19821 )	2024-04-28 11:56:40 -04:00
Adrian Wälchli	5e0e02b79e	Remove support for PyTorch 1.13 (#19706 )	2024-04-27 01:24:07 -04:00
Adrian Wälchli	b9680a364d	Update changelog after 2.2.2 release (#19770 )	2024-04-22 13:52:43 -04:00
thomas chaton	a2b3dddf1d	Update Lightning Cloud to 0.5.67 (#19795 )	2024-04-22 17:47:04 +01:00
awaelchli	c235f20e71	Remove the requirement for FSDPStrategy subclasses to only support GPU (#19781 )	2024-04-17 01:28:44 +02:00
David de la Iglesia Castro	58ad56afec	Use `step` interval in `estimated_stepping_batches` docs example (#19774 )	2024-04-15 10:16:17 -04:00
awaelchli	ce90b3898a	Sanitize hparams that can't be json-serialized in `WandbLogger.log_hyperparameters()` (#19769 )	2024-04-14 15:01:58 +02:00
PL Ghost	67b270bd4d	Adding test for legacy checkpoint created with 2.2.2 (#19760 )	2024-04-12 09:19:39 -04:00
Jirka Borovec	f642d68508	ci/lint: simlify prettier (#19742 )	2024-04-12 13:11:21 +02:00
pre-commit-ci[bot]	3f97e16cd4	[pre-commit.ci] pre-commit suggestions (#19723 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>	2024-04-12 06:40:25 -04:00
awaelchli	dcb91d53d2	Fix initialized weights resetting in `Fabric.setup()` when using FSDP (#19755 )	2024-04-11 05:52:28 -04:00
awaelchli	316cc71c2b	Skip tests that cause CLI argparse errors on Python 3.11.9 (#19756 )	2024-04-11 05:01:27 -04:00
Dominic Kerr	76b691d80c	Support pathlib.Path file paths when saving ONNX models (#19727 ) Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co>	2024-04-03 20:42:25 -04:00
Alexander Jipa	ce88483c6f	Add synchronous parameter to MLflowLogger (#19639 ) Co-authored-by: Alexander Jipa <azzhipa@amazon.com>	2024-04-03 18:16:14 -04:00
awaelchli	8947d135d6	Skip test with compile error on torch=2.2.2 on Windows (#19734 )	2024-04-03 17:53:46 -04:00
dependabot[bot]	d25014dbda	build(deps): bump Lightning-AI/utilities from 0.11.0 to 0.11.2 (#19719 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-01 10:38:05 -04:00
awaelchli	438f29f07a	Relax restrictions on wrapping a custom batch sampler in predict (#19678 )	2024-03-27 23:45:50 +01:00

... 2 3 4 5 6 ...

10455 Commits All Branches Search

10455 Commits

All Branches