Corwin Joy
631911c004
Add special logic for 'step' in _optimizer_to_device ( #20019 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2024-08-05 17:17:06 -04:00
awaelchli
345450b0c3
Fix parameter count in ModelSummary when parameters are DTensors ( #20163 )
2024-08-05 10:57:31 -04:00
awaelchli
6c70dd7cf0
Fix attribute error on `_NotYetLoadedTensor` after loading checkpoint into quantized model with `_lazy_load()` ( #20121 )
2024-07-24 05:39:40 -04:00
awaelchli
d0a6b34ea9
Avoid printing the seed info message multiple times ( #20108 )
2024-07-20 20:25:11 +02:00
awaelchli
e214395d31
Remove confusing warning "Missing logger folder" ( #20109 )
2024-07-20 20:24:38 +02:00
awaelchli
bdafe5e739
Add Python 3.12 to the CPU test matrix ( #20078 )
2024-07-13 06:07:35 -04:00
awaelchli
7d1a70752f
Update PyTorch 2.4 tests ( #20079 )
2024-07-13 05:09:09 -04:00
Abhishek Singh
d5ae9ec568
Make numpy an optional dependency in `utilities\seed.py` ( #20055 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2024-07-12 17:24:04 -04:00
awaelchli
9987d993a0
Remove support for Python 3.8 ( #20071 )
2024-07-12 10:33:35 -04:00
awaelchli
5829ef8ab3
Set `weights_only` in tests to avoid warnings in PyTorch 2.4 ( #20057 )
2024-07-08 04:38:27 -04:00
awaelchli
693c21ac1b
Add testing for PyTorch 2.4 (Fabric) ( #20028 )
2024-07-02 18:01:03 -04:00
awaelchli
14493c0685
Drop PyTorch 2.0 from the test matrix ( #20009 )
2024-06-30 18:02:00 -04:00
awaelchli
e330da5870
Fix torch-numpy compatibility conflict in tests ( #20004 )
2024-06-21 20:20:59 -04:00
Douwe den Blanken
4f96c83ba0
Sanitize argument-free object params before logging ( #19771 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2024-06-06 14:51:48 -04:00
Liyang90
7668a6bf59
Flexible and easy to use HSDP setting ( #19504 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2024-06-05 20:15:03 -04:00
awaelchli
1a6786d682
Destroy process group in atexit handler ( #19931 )
2024-06-05 19:31:43 -04:00
awaelchli
896c2a656a
Error for unsupported precision types with ModelParallelStrategy ( #19902 )
2024-05-23 13:43:46 -04:00
awaelchli
8fc7b4ae94
Remove the requirement for FSDPStrategy subclasses to only support GPU ( #19894 )
2024-05-22 18:31:40 +02:00
awaelchli
7e87ce05c8
Fix state dict loading in bitsandbytes plugin when checkpoint is already quantized ( #19886 )
...
* bugfix
* add test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* add chlog
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-21 13:46:01 -04:00
awaelchli
32e241870b
(5/n) Support 2D Parallelism in Lightning Trainer ( #19878 )
...
* ModelParallelStrategy for Lightning Trainer
* mypy
* import fix
* fix torchscript errors
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix docs issue
* fix test execution
* Update src/lightning/pytorch/strategies/model_parallel.py
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2024-05-17 19:03:31 -04:00
awaelchli
1d0c6aae96
(4/n) Support 2D Parallelism - Loading optimizer states correctly ( #19872 )
...
* Load optimizer state
* move to utility
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-17 17:17:32 -04:00
awaelchli
cd8acc26c3
(3/n) Support 2D Parallelism - Efficient loading of full-state checkpoints ( #19870 )
...
* memory-optimized loading of full checkpoints into dist model
* simplify
* handle buffers
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* handle strict loading, buffers, and add test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* chlog
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-15 13:07:31 -04:00
awaelchli
9455871c93
(2/n) Support 2D Parallelism - Distributed Checkpoints ( #19852 )
...
* distributed checkpoints
* use decorator
* refactor if-strict
* update example
* filter non-persistent buffers (todo, add test)
* simplify checkpoint loading for model
2024-05-15 08:19:08 -04:00
awaelchli
e0307277a0
Add function to explicitly mark forward methods in Fabric ( #19690 )
...
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-05-08 16:58:33 -04:00
awaelchli
0c8a193d3c
(1/n) Support 2D Parallelism ( #19846 )
2024-05-07 17:02:58 -04:00
Adrian Wälchli
49ed2b102b
Add PyTorch 2.3 to CI matrix ( #19708 )
2024-04-29 07:16:13 -04:00
Adrian Wälchli
29136332d6
Avoid interactions through test artifacts ( #19821 )
2024-04-28 11:56:40 -04:00
Adrian Wälchli
5e0e02b79e
Remove support for PyTorch 1.13 ( #19706 )
2024-04-27 01:24:07 -04:00
awaelchli
ce90b3898a
Sanitize hparams that can't be json-serialized in `WandbLogger.log_hyperparameters()` ( #19769 )
2024-04-14 15:01:58 +02:00
awaelchli
dcb91d53d2
Fix initialized weights resetting in `Fabric.setup()` when using FSDP ( #19755 )
2024-04-11 05:52:28 -04:00
Carlos Mocholí
ca6c94c208
Fix monkeypatching of `_FabricModule` methods ( #19705 )
2024-03-27 11:03:21 -04:00
awaelchli
14e98ecbf2
Fix `torch.compile` patching when applied as decorator ( #19627 )
2024-03-15 08:12:48 -04:00
Carlos Mocholí
06eb3cc28b
Pass `enabled` down to `_BackwardSyncControl` ( #19577 )
2024-03-08 11:48:16 +01:00
awaelchli
b3c869f636
Revise checkpoint consolidation with PyTorch 2.3 ( #19561 )
2024-03-04 10:13:31 -05:00
awaelchli
13f15b38fc
Support consolidating sharded checkpoints with the `fabric` CLI ( #19560 )
2024-03-04 08:01:33 -05:00
awaelchli
7880c110e3
Alternative mechanism to detect missing `Fabric.backward()` call ( #19493 )
2024-02-27 17:57:32 +01:00
awaelchli
ea89133c65
Rename `fabric run model` to `fabric run` ( #19527 )
2024-02-27 11:36:46 -05:00
awaelchli
a41528c2a6
Update tests for PyTorch 2.2.1 ( #19521 )
2024-02-23 13:11:34 -05:00
Jirka Borovec
99fe6563ef
precommit: ruff-format ( #19434 )
...
* precommit: ruff-format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* manual update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* manual update
* order
* mypy
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* mypy
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-15 13:39:17 -05:00
awaelchli
265025bd5d
Inform the user about a missing `fabric.backward()` call ( #19447 )
2024-02-14 17:49:11 -05:00
Carlos Mocholí
67459944ea
Avoid FSDP deprecations during save/load with newer torch versions ( #19463 )
...
* Avoid FSDP deprecations during save/load with newer torch versions
* Refactor
* Tests
2024-02-14 19:43:59 +01:00
awaelchli
3fbc29ba21
Fix `CSVLogger` trying to append to file from previous run in same version folder ( #19446 )
2024-02-13 13:59:04 -05:00
awaelchli
3c5a465cfc
Create barrier without timeout in `prepare_data()` ( #19448 )
2024-02-13 12:10:07 +01:00
awaelchli
e950bb4828
Remove the Graphcore IPU integration ( #19405 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2024-02-12 16:16:02 -05:00
Justus Schock
2ed7282f7c
Rename Lightning Fabric CLI ( #19442 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-02-12 17:22:53 +01:00
nik777
7a56ac5182
Support shortcut name for DeepSpeed stage 1 offload ( #19075 )
2024-02-05 20:53:18 -05:00
awaelchli
fb0ce03a9c
Fix input validation to support passing `device_mesh` to FSDP ( #19392 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2024-02-02 06:48:12 -05:00
awaelchli
01f8531c9d
Refactor BoringFabric in tests ( #19364 )
2024-01-30 23:32:45 +01:00
awaelchli
6018b0743c
Error message to inform bitsandbytes is only supported on CUDA ( #19360 )
2024-01-29 19:52:28 -05:00
awaelchli
1a59097ab2
Drop support for PyTorch 1.12 ( #19300 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-01-26 11:44:24 -05:00