Adrian Wälchli
da79480054
PyTest random order for Fabric tests ( #19040 )
2023-11-22 16:41:49 -05:00
Adrian Wälchli
d4614d043e
Address test flakiness ( #19022 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-11-21 17:11:00 -05:00
Adrian Wälchli
e3be762538
Re-enable dynamo tests that were fixed in PyTorch 2.1 ( #19038 )
2023-11-21 16:30:20 -05:00
Adrian Wälchli
f652e6c00e
Fix `rank_zero_only` rank not set in ddp-spawn based strategies ( #19030 )
2023-11-20 10:49:14 -05:00
Adrian Wälchli
45c2fcb341
Add AttributeDict container for Fabric ( #18943 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-11-18 09:25:26 -05:00
Adrian Wälchli
340961a6ec
Fix test interactions ( #18994 )
2023-11-13 12:35:46 -05:00
Carlos Mocholí
466f772e3e
Fix precision default from environment ( #18928 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-11-10 23:03:51 +01:00
Carlos Mocholí
d9aa833628
Add more CUDA card FLOPs ( #18958 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-07 04:13:20 +01:00
Adrian Wälchli
195a3bf5b5
Fix parsing v100s in `get_available_flops` ( #18952 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-11-06 21:50:11 +01:00
Jason Won
8d68607cef
Flatten dataclass hyperparameters for logging ( #18906 )
...
Co-authored-by: jaswon <jason@jwon.xyz>
2023-11-03 19:30:19 -04:00
Carlos Mocholí
2b6b594dab
Rename Throughput flops argument ( #18924 )
2023-11-02 16:06:40 +01:00
Carlos Mocholí
5f6669f6b3
Add batches argument to throughput ( #18905 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-11-02 04:15:03 +01:00
Adrian Wälchli
98685c332b
Fix parsing of version in TensorBoardLogger and CSVLogger ( #18897 )
2023-11-01 12:48:36 -04:00
Adrian Wälchli
7a5b7f5561
Skip hanging collective test ( #18908 )
2023-11-01 15:45:25 +01:00
Adrian Wälchli
018a308269
Enable RUF018 rule for walrus assignments in asserts ( #18886 )
2023-10-30 21:16:02 -04:00
Adrian Wälchli
079544a902
Rename PrecisionPlugin -> Precision ( #18840 )
2023-10-30 16:53:13 -04:00
Carlos Mocholí
800b87eb46
Add throughput utilities to Fabric and the Trainer ( #18848 )
2023-10-30 17:10:29 +01:00
Adrian Wälchli
e66be675d2
Refined FSDP saving logic and error messaging when path exists ( #18884 )
2023-10-30 10:05:28 -04:00
Adrian Wälchli
9e75bc9572
Fix failing lightning cli entry point ( #18821 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-24 20:51:11 -04:00
Carlos Mocholí
78ad390b5b
Restore support for builds without distributed ( #18859 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-10-25 02:48:44 +02:00
Adrian Wälchli
6bfde6a80c
Change dangerous default random seed selection ( #18846 )
2023-10-24 19:59:38 -04:00
Adrian Wälchli
97303b0168
Avoid false-positive warnings about method calls on the Fabric-wrapped module ( #18819 )
2023-10-22 22:26:28 -04:00
Carlos Mocholí
5a83f541da
Minor strategy fixes [TPU] ( #18774 )
2023-10-11 15:26:30 +02:00
Carlos Mocholí
27ad9e9243
xfail collective tests ( #18779 )
2023-10-11 05:54:55 +02:00
Adrian Wälchli
e02bb391af
Utility to disable all instances of `PossibleUserWarning` ( #18744 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-10 06:53:32 -04:00
Adrian Wälchli
acc0cf02cf
Refinements to the num-workers warning ( #18737 )
2023-10-09 22:17:47 -04:00
Adrian Wälchli
377534072b
Split `Precision.init_context` ( #18734 )
2023-10-09 12:34:30 -04:00
Adrian Wälchli
87dff9928e
Handle edge case for `find_usable_cuda_devices(0)` ( #18722 )
2023-10-06 23:44:33 -04:00
Adrian Wälchli
5d819c91fb
Remove `fsdp_overlap_step_with_backward` in favor of native solution ( #18726 )
2023-10-06 08:11:41 -04:00
Adrian Wälchli
c514f1cbea
Enable PyTorch 2.1 ( #18718 )
2023-10-06 07:17:03 -04:00
Carlos Mocholí
71aed751f7
Forbid passing precision and a precision plugin ( #18671 )
2023-10-05 17:41:36 +02:00
Carlos Mocholí
31a1dad099
Fix BNB int8-training support ( #18721 )
2023-10-05 16:01:59 +02:00
Adrian Wälchli
09a0fb26d2
Set an upper limit on CPU threads in distributed training ( #18677 )
2023-10-04 19:57:37 -04:00
Carlos Mocholí
4c83ffd04c
Avoid importing bitsandbytes unless requested ( #18680 )
2023-10-05 01:10:10 +02:00
Carlos Mocholí
e3960749d8
Forbid init_module on-device instantiation with bnb ignored modules ( #18704 )
2023-10-05 00:57:07 +02:00
Adrian Wälchli
d31ef1f7d3
Drop support for PyTorch 1.11 ( #18691 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-04 20:30:44 +02:00
pre-commit-ci[bot]
c0ec0decec
[pre-commit.ci] pre-commit suggestions ( #18697 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-03 22:07:21 +02:00
Adrian Wälchli
256f16ed42
Enable passing `load_state_dict(..., assign=True|False)` in FabricModule ( #18690 )
2023-10-03 13:49:39 -04:00
Carlos Mocholí
5120ad20f2
Bitsandbytes precision plugin ( #18655 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-09-29 19:17:18 +02:00
Adrian Wälchli
3cd463efa8
Remove outdated workaround for PyTorch autocast bug ( #18634 )
2023-09-29 08:33:43 -04:00
Adrian Wälchli
d05cd3fa0a
Fix KeyError when calling `Fabric.load_raw` before setting up an FSDP model ( #18647 )
2023-09-29 07:35:27 -04:00
Carlos Mocholí
70a11d9739
Forbid non-FSDP precision plugins with FSDP ( #18664 )
2023-09-29 10:07:51 +02:00
Jirka Borovec
830a62a722
ruff: replace isort with ruff +TPU ( #17684 )
...
* ruff: replace isort with ruff
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fixing & imports
* lines in warning test
* docs
* fix enum import
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fixing
* import
* fix lines
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* type ClusterEnvironment
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:55 -04:00
Jirka Borovec
358336268f
enable codespell for docs & fixing +TPU ( #18629 )
...
* precommit/codespell
* run
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* disable
* more fixing
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
* more fixing
* json
* note
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:44 -04:00
Adrian Wälchli
894952d33e
Avoid redundant input-type casting in FSDP precision ( #18630 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-26 08:55:13 -04:00
Adrian Wälchli
38764f0746
Enable launching via torchrun in slurm environment ( #18618 )
2023-09-26 07:40:22 -04:00
Adrian Wälchli
f83ad093e5
Utility function to check shared filesystem ( #18586 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-25 15:49:52 -04:00
Adrian Wälchli
57f5268eb3
Improve the suggested `num_workers` warning ( #18591 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-21 09:38:25 -04:00
Adrian Wälchli
66f15cf327
Input validation for `num_nodes` argument ( #18598 )
2023-09-20 11:09:50 -04:00
Adrian Wälchli
8094855137
Avoid passing process group to enable FSDP's hybrid-shard ( #18583 )
2023-09-19 13:46:24 -04:00