Commit Graph

4037 Commits

Author SHA1 Message Date
thomas chaton 85933f355a
Improve map and chunkify (#18901)
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-01 09:35:35 +00:00
Adrian Wälchli 31b8777350
Update CLI tests to no longer require 3rd party logger dependencies (#18899) 2023-10-31 09:22:17 -04:00
Adrian Wälchli 018a308269
Enable RUF018 rule for walrus assignments in asserts (#18886) 2023-10-30 21:16:02 -04:00
Adrian Wälchli 079544a902
Rename PrecisionPlugin -> Precision (#18840) 2023-10-30 16:53:13 -04:00
thomas chaton 6de491605c
Add DataRecipe (#18892)
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 18:26:13 +00:00
Carlos Mocholí 800b87eb46
Add throughput utilities to Fabric and the Trainer (#18848) 2023-10-30 17:10:29 +01:00
Adrian Wälchli e66be675d2
Refined FSDP saving logic and error messaging when path exists (#18884) 2023-10-30 10:05:28 -04:00
thomas chaton 2526c9081f
Prevent leaking the thread to the workers (#18891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 12:36:41 +00:00
thomas chaton c1437ccadf
Improve Streaming Dataset API (#18882)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-27 18:19:17 +01:00
thomas chaton 0843041d1d
Add broadcast to Dataset Optimizer with multiple nodes (#18860)
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-27 00:42:46 +01:00
Carlos Mocholí 182c30b129
Update Habana integration to 1.2 (#18877)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-26 19:08:50 -04:00
BoringDonut e50b68aae3
Bugfix/18394 batch size finder max val batches (#18854)
Co-authored-by: Oleksandra Sokol <o.sokol@samsung.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-25 15:03:42 -04:00
thomas chaton 874825857f
Add distributed support for StreamingDataset (#18850)
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-25 03:18:20 +01:00
Adrian Wälchli 9e75bc9572
Fix failing lightning cli entry point (#18821)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-24 20:51:11 -04:00
Carlos Mocholí 78ad390b5b
Restore support for builds without distributed (#18859)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-10-25 02:48:44 +02:00
Adrian Wälchli 6bfde6a80c
Change dangerous default random seed selection (#18846) 2023-10-24 19:59:38 -04:00
Mauricio Villegas c5a731c3cd
LinghtningCLI now will not allow setting a class instance as a default (#18822) 2023-10-23 20:21:06 -04:00
thomas chaton e59dc41c8e
Improve DatasetOptimizer API (#18827)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-23 18:06:48 +01:00
Adrian Wälchli 97303b0168
Avoid false-positive warnings about method calls on the Fabric-wrapped module (#18819) 2023-10-22 22:26:28 -04:00
thomas chaton e7afe04ee8
Tiny fixes for the Cache & DatasetOptimizer (#18817)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-19 05:41:35 -07:00
thomas chaton c68ff6482f
Add support for text (#18807)
* update

* update

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-18 20:28:12 -04:00
thomas chaton 3f86ad7ba7
Add name and version (#18796)
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-16 06:33:50 -07:00
thomas chaton 142977d3eb
Introduce Dataset Optimizer (#18788)
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-13 14:52:23 +01:00
hiaoxui e0f2be0055
Fix bug when removing last checkpoint with deepspeed (#18793)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-13 06:32:55 -04:00
Adrian Wälchli 6f6c07dddf
Revert removal of empty-parameters check for `configure_optimizers()` with FSDP (#18785) 2023-10-12 04:36:49 -04:00
Adrian Wälchli c5e3c4518f
Save ModelCheckpoint's `last.ckpt` as symlink if possible (#18748)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-10-11 11:35:13 -04:00
Carlos Mocholí 7434c47fe7
Raise an exception when calling `fit` twice with spawn (#18776)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-10-11 16:08:59 +02:00
Carlos Mocholí 5a83f541da
Minor strategy fixes [TPU] (#18774) 2023-10-11 15:26:30 +02:00
Carlos Mocholí 27ad9e9243
xfail collective tests (#18779) 2023-10-11 05:54:55 +02:00
Adrian Wälchli c39f680160
Fix deletion of resumed checkpoints (#18750)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-10-10 09:08:36 -04:00
Adrian Wälchli e02bb391af
Utility to disable all instances of `PossibleUserWarning` (#18744)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-10 06:53:32 -04:00
Adrian Wälchli acc0cf02cf
Refinements to the num-workers warning (#18737) 2023-10-09 22:17:47 -04:00
Adrian Wälchli a26424e89e
Fix zero-grad behavior when entering the validation loop (#18710)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-09 18:07:54 -04:00
Adrian Wälchli 377534072b
Split `Precision.init_context` (#18734) 2023-10-09 12:34:30 -04:00
thomas chaton 1d5851ffe2
Introduce Cache 1/n (#18642)
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-09 16:06:32 +01:00
Adrian Wälchli 87dff9928e
Handle edge case for `find_usable_cuda_devices(0)` (#18722) 2023-10-06 23:44:33 -04:00
Adrian Wälchli 5d819c91fb
Remove `fsdp_overlap_step_with_backward` in favor of native solution (#18726) 2023-10-06 08:11:41 -04:00
Adrian Wälchli c514f1cbea
Enable PyTorch 2.1 (#18718) 2023-10-06 07:17:03 -04:00
Carlos Mocholí 71aed751f7
Forbid passing precision and a precision plugin (#18671) 2023-10-05 17:41:36 +02:00
Carlos Mocholí 31a1dad099
Fix BNB int8-training support (#18721) 2023-10-05 16:01:59 +02:00
Adrian Wälchli 09a0fb26d2
Set an upper limit on CPU threads in distributed training (#18677) 2023-10-04 19:57:37 -04:00
Carlos Mocholí 4c83ffd04c
Avoid importing bitsandbytes unless requested (#18680) 2023-10-05 01:10:10 +02:00
Carlos Mocholí e3960749d8
Forbid init_module on-device instantiation with bnb ignored modules (#18704) 2023-10-05 00:57:07 +02:00
Adrian Wälchli d31ef1f7d3
Drop support for PyTorch 1.11 (#18691)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-04 20:30:44 +02:00
pre-commit-ci[bot] c0ec0decec
[pre-commit.ci] pre-commit suggestions (#18697)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-03 22:07:21 +02:00
Carlos Mocholí 9c4a3fd68a
Always pass the correct batch index to the automatic optimization loop (#18619) 2023-10-03 21:23:36 +02:00
dependabot[bot] 74b2ff8196
Update fsspec[http] requirement from <2023.7.0,>2021.06.0 to >2021.06.0,<2023.10.0 in /requirements (#18469)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-03 20:52:20 +02:00
Adrian Wälchli 256f16ed42
Enable passing `load_state_dict(..., assign=True|False)` in FabricModule (#18690) 2023-10-03 13:49:39 -04:00
Jirka Borovec d84ee36e98
test: fix compatibility with `onnxruntime` 0.16+ (#18692) 2023-10-03 19:42:42 +02:00
Adrian Wälchli b69f3c6d10
Maintain float32 precision at minimum in ResultMetric (#18686) 2023-10-03 13:18:59 -04:00
Jirka Borovec df959aeb4f
fix `pydantic` compatibility for 2.0+ & allow new `fastAPI` (#18676) 2023-09-30 07:43:29 +02:00
Carlos Mocholí 5120ad20f2
Bitsandbytes precision plugin (#18655)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-09-29 19:17:18 +02:00
Ethan Harris 02780f2cb3
App: Fix dispatch return value (#18674) 2023-09-29 15:12:31 +01:00
Adrian Wälchli e4b75d16d1
Add a warning for problematic dataloader settings when `reload_dataloaders_every_n_epochs>0` (#18672)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-29 09:31:34 -04:00
Adrian Wälchli 996e7684a1
Update `persistent_workers` recommendation when using spawn launcher (#18649)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-29 08:39:33 -04:00
Adrian Wälchli 3cd463efa8
Remove outdated workaround for PyTorch autocast bug (#18634) 2023-09-29 08:33:43 -04:00
Adrian Wälchli d05cd3fa0a
Fix KeyError when calling `Fabric.load_raw` before setting up an FSDP model (#18647) 2023-09-29 07:35:27 -04:00
Carlos Mocholí 70a11d9739
Forbid non-FSDP precision plugins with FSDP (#18664) 2023-09-29 10:07:51 +02:00
nik777 ac713656da
Updated check on model step output types from dict to Mapping (#18657) 2023-09-28 20:31:40 -04:00
Ethan Harris 363da4aa85
App: Drop actions (#18660) 2023-09-28 14:37:37 +01:00
Jirka Borovec 830a62a722
ruff: replace isort with ruff +TPU (#17684)
* ruff: replace isort with ruff

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing & imports

* lines in warning test

* docs

* fix enum import

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing

* import

* fix lines

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* type ClusterEnvironment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:55 -04:00
Jirka Borovec 358336268f
enable codespell for docs & fixing +TPU (#18629)
* precommit/codespell

* run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable

* more fixing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

* more fixing

* json

* note

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:44 -04:00
Adrian Wälchli 894952d33e
Avoid redundant input-type casting in FSDP precision (#18630)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-26 08:55:13 -04:00
Adrian Wälchli 38764f0746
Enable launching via torchrun in slurm environment (#18618) 2023-09-26 07:40:22 -04:00
Adrian Wälchli f83ad093e5
Utility function to check shared filesystem (#18586)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-25 15:49:52 -04:00
Jirka Borovec d579cfed57
precommit: unify formatting with prettier (#18605)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-25 14:34:41 +02:00
Jirka Borovec 6d7019ca81
replace `tmpdir` by `tmp_path` in tests_data/ (#18604) 2023-09-22 11:08:28 +02:00
Jirka Borovec 3fbf1540b0
docs: 3/3 enable Sphinx nitpicky [app] (#18603)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-21 22:44:59 +02:00
Adrian Wälchli 57f5268eb3
Improve the suggested `num_workers` warning (#18591)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-21 09:38:25 -04:00
ioangatop 3e2cd24a3d
Fix `Trainer`'s `log_dir` method for `CSVLogger` (#18548)
Co-authored-by: Tianshu Wang <github@wang.tianshu.me>
Co-authored-by: Tianshu Wang <hi@wang.tianshu.me>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-09-20 19:46:19 -04:00
Taylor 3a594622c1
Raise exception when `load_from_checkpoint` called from instance (#18432)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-09-20 13:45:12 -04:00
Adrian Wälchli 66f15cf327
Input validation for `num_nodes` argument (#18598) 2023-09-20 11:09:50 -04:00
Carlos Mocholí 3bfd7b2558
Support callback subclasses in `configure_callbacks` (#18508)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-20 10:56:05 -04:00
PL Ghost e10ddf9799
Adding test for legacy checkpoint created with 2.0.9 (#18560)
Co-authored-by: Borda <Borda@users.noreply.github.com>
2023-09-19 15:55:32 -04:00
Adrian Wälchli 8094855137
Avoid passing process group to enable FSDP's hybrid-shard (#18583) 2023-09-19 13:46:24 -04:00
Adrian Wälchli bc85c5fd14
Prettier warning output (#18288) 2023-09-19 02:11:58 +02:00
Adrian Wälchli 69119b068a
Avoid rewriting the metrics file in CSVLogger unless necessary (#18567) 2023-09-18 09:12:48 -04:00
Adrian Wälchli 2b9f0ae640
Lazily import dependencies for NeptuneLogger (#18573) 2023-09-18 08:02:41 -04:00
Adrian Wälchli 9d7bc82139
Move `_KINETO_AVAILABLE` check to profiler (#18575) 2023-09-17 19:23:33 +02:00
Adrian Wälchli 3aa08b087f
Remove reference to training in checkpoint loading error message (#18554) 2023-09-15 19:45:27 -04:00
Adrian Wälchli c1ee22a687
Optimize import paths for optional dependencies (#18561)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-15 17:04:19 -04:00
Carlos Mocholí d8e9eba606
[Fabric] Replace `@contextlib.contextmanager` (#18557) 2023-09-15 17:27:29 +02:00
Adrian Wälchli 13760cdee7
Lazily import dependencies for CometLogger (#18540) 2023-09-14 16:47:43 -04:00
Adrian Wälchli ac576ffb80
Refactor NeptuneLogger tests from unittest to pytest (#18549) 2023-09-14 19:09:26 +02:00
Carlos Mocholí eb3b96d8bd
Avoid modifying the default dtype on exception (#18500) 2023-09-14 15:32:32 +02:00
Adrian Wälchli 670b490b64
Avoid warning about logging interval for fast dev run (#18550) 2023-09-14 06:27:14 -04:00
Justus Schock 1cee84ca2d
Replace LightningClient with import from lightning_cloud (#18544) 2023-09-13 14:55:05 +02:00
Adrian Wälchli b1c83de086
Lazily import dependencies for WandbLogger (#18538) 2023-09-13 06:34:12 -04:00
Jirka Borovec dbe7ed46a3
replace tests skip with soft xfail (#18486)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 23:11:03 +02:00
Adrian Wälchli 6ab443f9c8
Lazily import dependencies for MLFlowLogger (#18528) 2023-09-12 09:17:57 -04:00
Adrian Wälchli c959df74b8
Support saving and loading stateful objects in Fabric (#18513)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 07:58:52 -04:00
Adrian Wälchli e958c6f1dc
More robust FabricOptimizer/LightningOptimizer wrapping logic (#18516) 2023-09-12 09:30:44 +02:00
qunhong zneg a0d6fcc212
Make the batch_idx argument optional in all step methods (#18512)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-09-11 14:39:38 +02:00
Adrian Wälchli 4dfc09c2fe
Change auto-device selection for Jupyter notebook environments (#18291)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-08 11:49:31 -04:00
Carlos Mocholí 756e481969
Support the TransformerEngine precision plugin with the Trainer (#18459)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-07 19:21:00 +02:00
Carlos Mocholí e1c5c5ae4a
[TPU] Set the compute_dtype with XLAFSDP (#18497) 2023-09-07 18:43:21 +02:00
Adrian Wälchli cc18bd781e
Refactor assertions that use walrus (#18496) 2023-09-07 11:49:04 -04:00
Adrian Wälchli 8381ed37c7
Set limits for `fetcher.done` (#18441)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-07 10:46:49 -04:00
Adrian Wälchli cf437ed7c6
Refresh `_FabricOptimizer.__dict__` when loading a state dict (#18488) 2023-09-06 11:51:42 -04:00
Adrian Wälchli 2d10dc4dc8
Fix progress bar display `v_num` when running with fast-dev-run (#18491)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-06 11:51:14 -04:00