Commit Graph

86 Commits

Author SHA1 Message Date
awaelchli 6c70dd7cf0
Fix attribute error on `_NotYetLoadedTensor` after loading checkpoint into quantized model with `_lazy_load()` (#20121) 2024-07-24 05:39:40 -04:00
awaelchli 7d1a70752f
Update PyTorch 2.4 tests (#20079) 2024-07-13 05:09:09 -04:00
awaelchli 693c21ac1b
Add testing for PyTorch 2.4 (Fabric) (#20028) 2024-07-02 18:01:03 -04:00
awaelchli 14493c0685
Drop PyTorch 2.0 from the test matrix (#20009) 2024-06-30 18:02:00 -04:00
awaelchli 7e87ce05c8
Fix state dict loading in bitsandbytes plugin when checkpoint is already quantized (#19886)
* bugfix

* add test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* add chlog

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-21 13:46:01 -04:00
Adrian Wälchli 49ed2b102b
Add PyTorch 2.3 to CI matrix (#19708) 2024-04-29 07:16:13 -04:00
Adrian Wälchli 5e0e02b79e
Remove support for PyTorch 1.13 (#19706) 2024-04-27 01:24:07 -04:00
awaelchli a41528c2a6
Update tests for PyTorch 2.2.1 (#19521) 2024-02-23 13:11:34 -05:00
Jirka Borovec 99fe6563ef
precommit: ruff-format (#19434)
* precommit: ruff-format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* manual update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* manual update

* order

* mypy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mypy

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-15 13:39:17 -05:00
awaelchli 1a59097ab2
Drop support for PyTorch 1.12 (#19300)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-01-26 11:44:24 -05:00
Jirka Borovec 3bd133b107
CI: enable testing with coming PT 2.2 (#19289)
* ci: build dockers for PT 2.2
* py3.12
* --pre --extra-index-url
* typing-extensions
* bump jsonargparse
* install latest jsonargparse
* Add windows skips for Fabric
* convert to xfail
* add pytorch skips
* skip checkpoint consolidation test
* set max torch

---------

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-01-26 16:42:09 +01:00
Carlos Mocholí a1dd9efcf7
Drop XLA XRT support (#19232)
* Drop XLA XRT support
* update test
* set launched
* update conftest
* xla available check
---------

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-01-10 18:39:20 +01:00
Carlos Mocholí 6dfa5cca9d
Support 4bit BNB layers meta-device materialization (#19150) 2023-12-20 22:13:18 +01:00
Carlos Mocholí 97469c600f
TransformerEngine fallback compute dtype (#19082)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-12-14 03:02:09 +01:00
Adrian Wälchli 197b22586a
Fix comm initialization in `MPIEnvironment` (#19074) 2023-11-28 16:14:46 -05:00
Adrian Wälchli 7a5b7f5561
Skip hanging collective test (#18908) 2023-11-01 15:45:25 +01:00
Adrian Wälchli 018a308269
Enable RUF018 rule for walrus assignments in asserts (#18886) 2023-10-30 21:16:02 -04:00
Adrian Wälchli 079544a902
Rename PrecisionPlugin -> Precision (#18840) 2023-10-30 16:53:13 -04:00
Carlos Mocholí 78ad390b5b
Restore support for builds without distributed (#18859)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-10-25 02:48:44 +02:00
Carlos Mocholí 5a83f541da
Minor strategy fixes [TPU] (#18774) 2023-10-11 15:26:30 +02:00
Carlos Mocholí 27ad9e9243
xfail collective tests (#18779) 2023-10-11 05:54:55 +02:00
Adrian Wälchli 377534072b
Split `Precision.init_context` (#18734) 2023-10-09 12:34:30 -04:00
Carlos Mocholí 4c83ffd04c
Avoid importing bitsandbytes unless requested (#18680) 2023-10-05 01:10:10 +02:00
Carlos Mocholí e3960749d8
Forbid init_module on-device instantiation with bnb ignored modules (#18704) 2023-10-05 00:57:07 +02:00
Adrian Wälchli d31ef1f7d3
Drop support for PyTorch 1.11 (#18691)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-04 20:30:44 +02:00
pre-commit-ci[bot] c0ec0decec
[pre-commit.ci] pre-commit suggestions (#18697)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-03 22:07:21 +02:00
Carlos Mocholí 5120ad20f2
Bitsandbytes precision plugin (#18655)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-09-29 19:17:18 +02:00
Adrian Wälchli 3cd463efa8
Remove outdated workaround for PyTorch autocast bug (#18634) 2023-09-29 08:33:43 -04:00
Jirka Borovec 830a62a722
ruff: replace isort with ruff +TPU (#17684)
* ruff: replace isort with ruff

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing & imports

* lines in warning test

* docs

* fix enum import

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing

* import

* fix lines

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* type ClusterEnvironment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:55 -04:00
Adrian Wälchli 894952d33e
Avoid redundant input-type casting in FSDP precision (#18630)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-26 08:55:13 -04:00
Carlos Mocholí d8e9eba606
[Fabric] Replace `@contextlib.contextmanager` (#18557) 2023-09-15 17:27:29 +02:00
Carlos Mocholí eb3b96d8bd
Avoid modifying the default dtype on exception (#18500) 2023-09-14 15:32:32 +02:00
Jirka Borovec dbe7ed46a3
replace tests skip with soft xfail (#18486)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 23:11:03 +02:00
Dan Dale c081b48324
Accommodate FSDP full-precision `param_dtype` training with PyTorch < 2.0 (#18278) 2023-08-14 12:22:26 +02:00
Adrian Wälchli 3142ed5e44
Integration tests for XLA precision (#18286) 2023-08-13 09:20:26 -04:00
Adrian Wälchli c95dbac2e8
Validate Trainer settings against cluster environment (#18292) 2023-08-12 21:26:37 +02:00
Adrian Wälchli 7fe8756917
[TPU] Proper half-precision implementation for XLA (#18213)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-11 11:37:41 -04:00
Adrian Wälchli 888466b144
Support true 16-bit precision with FSDP in Trainer (#18219) 2023-08-10 04:15:35 -04:00
Jirka Borovec efa7b2f9ef
docformatter: config with black (#18064)
* docformatter: config with black

* additional_dependencies: [tomli]

* 119

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-09 10:44:20 -04:00
Adrian Wälchli 41f0425a8d
Disable auto-detection of Kubeflow environment (#18137) 2023-07-28 05:03:48 -04:00
Carlos Mocholí 4c57c0bc07
[TPU] Do not cancel all jobs when one fails (#18052)
* Update tpu-tests.yml

* Update tpu-tests.yml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Needs

* if:

* missed this

* Fix issue on multinode

* Latest fixes

* last fix?

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-25 14:24:50 +02:00
Carlos Mocholí 0e7e6b31c5
Fix [TPU] tests (#18140)
* Fix [TPU] tests

* More
2023-07-24 15:13:36 +02:00
Carlos Mocholí 3d573d5e79
Fix [TPU] tests (#18136)
* Debug [TPU] tests

* -U

* Uninstall typing extensions

* Minor simplifications

* Silly cancelling logic

* pip3?

* sudo

* More

* Revert "Silly cancelling logic"

This reverts commit ce31d874f3.
2023-07-23 13:39:00 +02:00
Carlos Mocholí e9c42ed11f
More XLA fixes for nightly support (#18085) 2023-07-15 01:16:42 +02:00
Carlos Mocholí 3a55f0c0a1
Minor miscellaneous fixes (#18068) 2023-07-13 06:01:58 -04:00
Adrian Wälchli acc70d0ae5
Support all half-precision modes in FSDP precision plugin (#17807)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-07-09 18:40:46 +00:00
Adrian Wälchli c03dd38c6c
Refactor more Fabric tests that use the old .run() method (#17930)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-03 16:26:58 +02:00
Carlos Mocholí f78db4c674
Remove automatic sharding support with `Fabric.run` or `fabric.launch(fn)` (#17832)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-15 16:02:09 +00:00
Alexander Kreuzer f111bd483b
Fix to Parameters to `MixedPrecisionPlugin` are not validated and do not match doc string (#17687)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-06-07 14:35:54 +00:00
Leng Yue 2c8758f0a8
Fix Mix Precision settings for FSDP Plugins (#17670) 2023-05-23 11:35:37 -04:00