lightning

Commit Graph

Author	SHA1	Message	Date
Corwin Joy	631911c004	Add special logic for 'step' in _optimizer_to_device (#20019 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2024-08-05 17:17:06 -04:00
awaelchli	345450b0c3	Fix parameter count in ModelSummary when parameters are DTensors (#20163 )	2024-08-05 10:57:31 -04:00
awaelchli	6c70dd7cf0	Fix attribute error on `_NotYetLoadedTensor` after loading checkpoint into quantized model with `_lazy_load()` (#20121 )	2024-07-24 05:39:40 -04:00
awaelchli	d0a6b34ea9	Avoid printing the seed info message multiple times (#20108 )	2024-07-20 20:25:11 +02:00
awaelchli	e214395d31	Remove confusing warning "Missing logger folder" (#20109 )	2024-07-20 20:24:38 +02:00
awaelchli	bdafe5e739	Add Python 3.12 to the CPU test matrix (#20078 )	2024-07-13 06:07:35 -04:00
awaelchli	7d1a70752f	Update PyTorch 2.4 tests (#20079 )	2024-07-13 05:09:09 -04:00
Abhishek Singh	d5ae9ec568	Make numpy an optional dependency in `utilities\seed.py` (#20055 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-07-12 17:24:04 -04:00
awaelchli	9987d993a0	Remove support for Python 3.8 (#20071 )	2024-07-12 10:33:35 -04:00
awaelchli	5829ef8ab3	Set `weights_only` in tests to avoid warnings in PyTorch 2.4 (#20057 )	2024-07-08 04:38:27 -04:00
awaelchli	693c21ac1b	Add testing for PyTorch 2.4 (Fabric) (#20028 )	2024-07-02 18:01:03 -04:00
awaelchli	14493c0685	Drop PyTorch 2.0 from the test matrix (#20009 )	2024-06-30 18:02:00 -04:00
awaelchli	e330da5870	Fix torch-numpy compatibility conflict in tests (#20004 )	2024-06-21 20:20:59 -04:00
Douwe den Blanken	4f96c83ba0	Sanitize argument-free object params before logging (#19771 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-06-06 14:51:48 -04:00
Liyang90	7668a6bf59	Flexible and easy to use HSDP setting (#19504 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-06-05 20:15:03 -04:00
awaelchli	1a6786d682	Destroy process group in atexit handler (#19931 )	2024-06-05 19:31:43 -04:00
awaelchli	896c2a656a	Error for unsupported precision types with ModelParallelStrategy (#19902 )	2024-05-23 13:43:46 -04:00
awaelchli	8fc7b4ae94	Remove the requirement for FSDPStrategy subclasses to only support GPU (#19894 )	2024-05-22 18:31:40 +02:00
awaelchli	7e87ce05c8	Fix state dict loading in bitsandbytes plugin when checkpoint is already quantized (#19886 ) * bugfix * add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * add chlog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-21 13:46:01 -04:00
awaelchli	32e241870b	(5/n) Support 2D Parallelism in Lightning Trainer (#19878 ) * ModelParallelStrategy for Lightning Trainer * mypy * import fix * fix torchscript errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs issue * fix test execution * Update src/lightning/pytorch/strategies/model_parallel.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-17 19:03:31 -04:00
awaelchli	1d0c6aae96	(4/n) Support 2D Parallelism - Loading optimizer states correctly (#19872 ) * Load optimizer state * move to utility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-17 17:17:32 -04:00
awaelchli	cd8acc26c3	(3/n) Support 2D Parallelism - Efficient loading of full-state checkpoints (#19870 ) * memory-optimized loading of full checkpoints into dist model * simplify * handle buffers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * handle strict loading, buffers, and add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chlog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-15 13:07:31 -04:00
awaelchli	9455871c93	(2/n) Support 2D Parallelism - Distributed Checkpoints (#19852 ) * distributed checkpoints * use decorator * refactor if-strict * update example * filter non-persistent buffers (todo, add test) * simplify checkpoint loading for model	2024-05-15 08:19:08 -04:00
awaelchli	e0307277a0	Add function to explicitly mark forward methods in Fabric (#19690 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-05-08 16:58:33 -04:00
awaelchli	0c8a193d3c	(1/n) Support 2D Parallelism (#19846 )	2024-05-07 17:02:58 -04:00
Adrian Wälchli	49ed2b102b	Add PyTorch 2.3 to CI matrix (#19708 )	2024-04-29 07:16:13 -04:00
Adrian Wälchli	29136332d6	Avoid interactions through test artifacts (#19821 )	2024-04-28 11:56:40 -04:00
Adrian Wälchli	5e0e02b79e	Remove support for PyTorch 1.13 (#19706 )	2024-04-27 01:24:07 -04:00
awaelchli	ce90b3898a	Sanitize hparams that can't be json-serialized in `WandbLogger.log_hyperparameters()` (#19769 )	2024-04-14 15:01:58 +02:00
awaelchli	dcb91d53d2	Fix initialized weights resetting in `Fabric.setup()` when using FSDP (#19755 )	2024-04-11 05:52:28 -04:00
Carlos Mocholí	ca6c94c208	Fix monkeypatching of `_FabricModule` methods (#19705 )	2024-03-27 11:03:21 -04:00
awaelchli	14e98ecbf2	Fix `torch.compile` patching when applied as decorator (#19627 )	2024-03-15 08:12:48 -04:00
Carlos Mocholí	06eb3cc28b	Pass `enabled` down to `_BackwardSyncControl` (#19577 )	2024-03-08 11:48:16 +01:00
awaelchli	b3c869f636	Revise checkpoint consolidation with PyTorch 2.3 (#19561 )	2024-03-04 10:13:31 -05:00
awaelchli	13f15b38fc	Support consolidating sharded checkpoints with the `fabric` CLI (#19560 )	2024-03-04 08:01:33 -05:00
awaelchli	7880c110e3	Alternative mechanism to detect missing `Fabric.backward()` call (#19493 )	2024-02-27 17:57:32 +01:00
awaelchli	ea89133c65	Rename `fabric run model` to `fabric run` (#19527 )	2024-02-27 11:36:46 -05:00
awaelchli	a41528c2a6	Update tests for PyTorch 2.2.1 (#19521 )	2024-02-23 13:11:34 -05:00
Jirka Borovec	99fe6563ef	precommit: ruff-format (#19434 ) * precommit: ruff-format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * manual update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * manual update * order * mypy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-02-15 13:39:17 -05:00
awaelchli	265025bd5d	Inform the user about a missing `fabric.backward()` call (#19447 )	2024-02-14 17:49:11 -05:00
Carlos Mocholí	67459944ea	Avoid FSDP deprecations during save/load with newer torch versions (#19463 ) * Avoid FSDP deprecations during save/load with newer torch versions * Refactor * Tests	2024-02-14 19:43:59 +01:00
awaelchli	3fbc29ba21	Fix `CSVLogger` trying to append to file from previous run in same version folder (#19446 )	2024-02-13 13:59:04 -05:00
awaelchli	3c5a465cfc	Create barrier without timeout in `prepare_data()` (#19448 )	2024-02-13 12:10:07 +01:00
awaelchli	e950bb4828	Remove the Graphcore IPU integration (#19405 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2024-02-12 16:16:02 -05:00
Justus Schock	2ed7282f7c	Rename Lightning Fabric CLI (#19442 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>	2024-02-12 17:22:53 +01:00
nik777	7a56ac5182	Support shortcut name for DeepSpeed stage 1 offload (#19075 )	2024-02-05 20:53:18 -05:00
awaelchli	fb0ce03a9c	Fix input validation to support passing `device_mesh` to FSDP (#19392 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2024-02-02 06:48:12 -05:00
awaelchli	01f8531c9d	Refactor BoringFabric in tests (#19364 )	2024-01-30 23:32:45 +01:00
awaelchli	6018b0743c	Error message to inform bitsandbytes is only supported on CUDA (#19360 )	2024-01-29 19:52:28 -05:00
awaelchli	1a59097ab2	Drop support for PyTorch 1.12 (#19300 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>	2024-01-26 11:44:24 -05:00

1 2 3 4 5 ...

322 Commits