thomas chaton
85933f355a
Improve map and chunkify ( #18901 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-01 09:35:35 +00:00
Adrian Wälchli
31b8777350
Update CLI tests to no longer require 3rd party logger dependencies ( #18899 )
2023-10-31 09:22:17 -04:00
Adrian Wälchli
018a308269
Enable RUF018 rule for walrus assignments in asserts ( #18886 )
2023-10-30 21:16:02 -04:00
Adrian Wälchli
079544a902
Rename PrecisionPlugin -> Precision ( #18840 )
2023-10-30 16:53:13 -04:00
thomas chaton
6de491605c
Add DataRecipe ( #18892 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 18:26:13 +00:00
Carlos Mocholí
800b87eb46
Add throughput utilities to Fabric and the Trainer ( #18848 )
2023-10-30 17:10:29 +01:00
Adrian Wälchli
e66be675d2
Refined FSDP saving logic and error messaging when path exists ( #18884 )
2023-10-30 10:05:28 -04:00
thomas chaton
2526c9081f
Prevent leaking the thread to the workers ( #18891 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 12:36:41 +00:00
thomas chaton
c1437ccadf
Improve Streaming Dataset API ( #18882 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-27 18:19:17 +01:00
thomas chaton
0843041d1d
Add broadcast to Dataset Optimizer with multiple nodes ( #18860 )
...
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-27 00:42:46 +01:00
Carlos Mocholí
182c30b129
Update Habana integration to 1.2 ( #18877 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-26 19:08:50 -04:00
BoringDonut
e50b68aae3
Bugfix/18394 batch size finder max val batches ( #18854 )
...
Co-authored-by: Oleksandra Sokol <o.sokol@samsung.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-25 15:03:42 -04:00
thomas chaton
874825857f
Add distributed support for StreamingDataset ( #18850 )
...
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-25 03:18:20 +01:00
Adrian Wälchli
9e75bc9572
Fix failing lightning cli entry point ( #18821 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-24 20:51:11 -04:00
Carlos Mocholí
78ad390b5b
Restore support for builds without distributed ( #18859 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-10-25 02:48:44 +02:00
Adrian Wälchli
6bfde6a80c
Change dangerous default random seed selection ( #18846 )
2023-10-24 19:59:38 -04:00
Mauricio Villegas
c5a731c3cd
LinghtningCLI now will not allow setting a class instance as a default ( #18822 )
2023-10-23 20:21:06 -04:00
thomas chaton
e59dc41c8e
Improve DatasetOptimizer API ( #18827 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-23 18:06:48 +01:00
Adrian Wälchli
97303b0168
Avoid false-positive warnings about method calls on the Fabric-wrapped module ( #18819 )
2023-10-22 22:26:28 -04:00
thomas chaton
e7afe04ee8
Tiny fixes for the Cache & DatasetOptimizer ( #18817 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-10-19 05:41:35 -07:00
thomas chaton
c68ff6482f
Add support for text ( #18807 )
...
* update
* update
* update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-18 20:28:12 -04:00
thomas chaton
3f86ad7ba7
Add name and version ( #18796 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-16 06:33:50 -07:00
thomas chaton
142977d3eb
Introduce Dataset Optimizer ( #18788 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-13 14:52:23 +01:00
hiaoxui
e0f2be0055
Fix bug when removing last checkpoint with deepspeed ( #18793 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-13 06:32:55 -04:00
Adrian Wälchli
6f6c07dddf
Revert removal of empty-parameters check for `configure_optimizers()` with FSDP ( #18785 )
2023-10-12 04:36:49 -04:00
Adrian Wälchli
c5e3c4518f
Save ModelCheckpoint's `last.ckpt` as symlink if possible ( #18748 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-10-11 11:35:13 -04:00
Carlos Mocholí
7434c47fe7
Raise an exception when calling `fit` twice with spawn ( #18776 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-10-11 16:08:59 +02:00
Carlos Mocholí
5a83f541da
Minor strategy fixes [TPU] ( #18774 )
2023-10-11 15:26:30 +02:00
Carlos Mocholí
27ad9e9243
xfail collective tests ( #18779 )
2023-10-11 05:54:55 +02:00
Adrian Wälchli
c39f680160
Fix deletion of resumed checkpoints ( #18750 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-10-10 09:08:36 -04:00
Adrian Wälchli
e02bb391af
Utility to disable all instances of `PossibleUserWarning` ( #18744 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-10 06:53:32 -04:00
Adrian Wälchli
acc0cf02cf
Refinements to the num-workers warning ( #18737 )
2023-10-09 22:17:47 -04:00
Adrian Wälchli
a26424e89e
Fix zero-grad behavior when entering the validation loop ( #18710 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-09 18:07:54 -04:00
Adrian Wälchli
377534072b
Split `Precision.init_context` ( #18734 )
2023-10-09 12:34:30 -04:00
thomas chaton
1d5851ffe2
Introduce Cache 1/n ( #18642 )
...
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-09 16:06:32 +01:00
Adrian Wälchli
87dff9928e
Handle edge case for `find_usable_cuda_devices(0)` ( #18722 )
2023-10-06 23:44:33 -04:00
Adrian Wälchli
5d819c91fb
Remove `fsdp_overlap_step_with_backward` in favor of native solution ( #18726 )
2023-10-06 08:11:41 -04:00
Adrian Wälchli
c514f1cbea
Enable PyTorch 2.1 ( #18718 )
2023-10-06 07:17:03 -04:00
Carlos Mocholí
71aed751f7
Forbid passing precision and a precision plugin ( #18671 )
2023-10-05 17:41:36 +02:00
Carlos Mocholí
31a1dad099
Fix BNB int8-training support ( #18721 )
2023-10-05 16:01:59 +02:00
Adrian Wälchli
09a0fb26d2
Set an upper limit on CPU threads in distributed training ( #18677 )
2023-10-04 19:57:37 -04:00
Carlos Mocholí
4c83ffd04c
Avoid importing bitsandbytes unless requested ( #18680 )
2023-10-05 01:10:10 +02:00
Carlos Mocholí
e3960749d8
Forbid init_module on-device instantiation with bnb ignored modules ( #18704 )
2023-10-05 00:57:07 +02:00
Adrian Wälchli
d31ef1f7d3
Drop support for PyTorch 1.11 ( #18691 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-04 20:30:44 +02:00
pre-commit-ci[bot]
c0ec0decec
[pre-commit.ci] pre-commit suggestions ( #18697 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-03 22:07:21 +02:00
Carlos Mocholí
9c4a3fd68a
Always pass the correct batch index to the automatic optimization loop ( #18619 )
2023-10-03 21:23:36 +02:00
dependabot[bot]
74b2ff8196
Update fsspec[http] requirement from <2023.7.0,>2021.06.0 to >2021.06.0,<2023.10.0 in /requirements ( #18469 )
...
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-10-03 20:52:20 +02:00
Adrian Wälchli
256f16ed42
Enable passing `load_state_dict(..., assign=True|False)` in FabricModule ( #18690 )
2023-10-03 13:49:39 -04:00
Jirka Borovec
d84ee36e98
test: fix compatibility with `onnxruntime` 0.16+ ( #18692 )
2023-10-03 19:42:42 +02:00
Adrian Wälchli
b69f3c6d10
Maintain float32 precision at minimum in ResultMetric ( #18686 )
2023-10-03 13:18:59 -04:00
Jirka Borovec
df959aeb4f
fix `pydantic` compatibility for 2.0+ & allow new `fastAPI` ( #18676 )
2023-09-30 07:43:29 +02:00
Carlos Mocholí
5120ad20f2
Bitsandbytes precision plugin ( #18655 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-09-29 19:17:18 +02:00
Ethan Harris
02780f2cb3
App: Fix dispatch return value ( #18674 )
2023-09-29 15:12:31 +01:00
Adrian Wälchli
e4b75d16d1
Add a warning for problematic dataloader settings when `reload_dataloaders_every_n_epochs>0` ( #18672 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-29 09:31:34 -04:00
Adrian Wälchli
996e7684a1
Update `persistent_workers` recommendation when using spawn launcher ( #18649 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-29 08:39:33 -04:00
Adrian Wälchli
3cd463efa8
Remove outdated workaround for PyTorch autocast bug ( #18634 )
2023-09-29 08:33:43 -04:00
Adrian Wälchli
d05cd3fa0a
Fix KeyError when calling `Fabric.load_raw` before setting up an FSDP model ( #18647 )
2023-09-29 07:35:27 -04:00
Carlos Mocholí
70a11d9739
Forbid non-FSDP precision plugins with FSDP ( #18664 )
2023-09-29 10:07:51 +02:00
nik777
ac713656da
Updated check on model step output types from dict to Mapping ( #18657 )
2023-09-28 20:31:40 -04:00
Ethan Harris
363da4aa85
App: Drop actions ( #18660 )
2023-09-28 14:37:37 +01:00
Jirka Borovec
830a62a722
ruff: replace isort with ruff +TPU ( #17684 )
...
* ruff: replace isort with ruff
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fixing & imports
* lines in warning test
* docs
* fix enum import
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fixing
* import
* fix lines
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* type ClusterEnvironment
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:55 -04:00
Jirka Borovec
358336268f
enable codespell for docs & fixing +TPU ( #18629 )
...
* precommit/codespell
* run
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* disable
* more fixing
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
* more fixing
* json
* note
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-26 11:54:44 -04:00
Adrian Wälchli
894952d33e
Avoid redundant input-type casting in FSDP precision ( #18630 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-26 08:55:13 -04:00
Adrian Wälchli
38764f0746
Enable launching via torchrun in slurm environment ( #18618 )
2023-09-26 07:40:22 -04:00
Adrian Wälchli
f83ad093e5
Utility function to check shared filesystem ( #18586 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-25 15:49:52 -04:00
Jirka Borovec
d579cfed57
precommit: unify formatting with prettier ( #18605 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-25 14:34:41 +02:00
Jirka Borovec
6d7019ca81
replace `tmpdir` by `tmp_path` in tests_data/ ( #18604 )
2023-09-22 11:08:28 +02:00
Jirka Borovec
3fbf1540b0
docs: 3/3 enable Sphinx nitpicky [app] ( #18603 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-21 22:44:59 +02:00
Adrian Wälchli
57f5268eb3
Improve the suggested `num_workers` warning ( #18591 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-21 09:38:25 -04:00
ioangatop
3e2cd24a3d
Fix `Trainer`'s `log_dir` method for `CSVLogger` ( #18548 )
...
Co-authored-by: Tianshu Wang <github@wang.tianshu.me>
Co-authored-by: Tianshu Wang <hi@wang.tianshu.me>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-09-20 19:46:19 -04:00
Taylor
3a594622c1
Raise exception when `load_from_checkpoint` called from instance ( #18432 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-09-20 13:45:12 -04:00
Adrian Wälchli
66f15cf327
Input validation for `num_nodes` argument ( #18598 )
2023-09-20 11:09:50 -04:00
Carlos Mocholí
3bfd7b2558
Support callback subclasses in `configure_callbacks` ( #18508 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-20 10:56:05 -04:00
PL Ghost
e10ddf9799
Adding test for legacy checkpoint created with 2.0.9 ( #18560 )
...
Co-authored-by: Borda <Borda@users.noreply.github.com>
2023-09-19 15:55:32 -04:00
Adrian Wälchli
8094855137
Avoid passing process group to enable FSDP's hybrid-shard ( #18583 )
2023-09-19 13:46:24 -04:00
Adrian Wälchli
bc85c5fd14
Prettier warning output ( #18288 )
2023-09-19 02:11:58 +02:00
Adrian Wälchli
69119b068a
Avoid rewriting the metrics file in CSVLogger unless necessary ( #18567 )
2023-09-18 09:12:48 -04:00
Adrian Wälchli
2b9f0ae640
Lazily import dependencies for NeptuneLogger ( #18573 )
2023-09-18 08:02:41 -04:00
Adrian Wälchli
9d7bc82139
Move `_KINETO_AVAILABLE` check to profiler ( #18575 )
2023-09-17 19:23:33 +02:00
Adrian Wälchli
3aa08b087f
Remove reference to training in checkpoint loading error message ( #18554 )
2023-09-15 19:45:27 -04:00
Adrian Wälchli
c1ee22a687
Optimize import paths for optional dependencies ( #18561 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-15 17:04:19 -04:00
Carlos Mocholí
d8e9eba606
[Fabric] Replace `@contextlib.contextmanager` ( #18557 )
2023-09-15 17:27:29 +02:00
Adrian Wälchli
13760cdee7
Lazily import dependencies for CometLogger ( #18540 )
2023-09-14 16:47:43 -04:00
Adrian Wälchli
ac576ffb80
Refactor NeptuneLogger tests from unittest to pytest ( #18549 )
2023-09-14 19:09:26 +02:00
Carlos Mocholí
eb3b96d8bd
Avoid modifying the default dtype on exception ( #18500 )
2023-09-14 15:32:32 +02:00
Adrian Wälchli
670b490b64
Avoid warning about logging interval for fast dev run ( #18550 )
2023-09-14 06:27:14 -04:00
Justus Schock
1cee84ca2d
Replace LightningClient with import from lightning_cloud ( #18544 )
2023-09-13 14:55:05 +02:00
Adrian Wälchli
b1c83de086
Lazily import dependencies for WandbLogger ( #18538 )
2023-09-13 06:34:12 -04:00
Jirka Borovec
dbe7ed46a3
replace tests skip with soft xfail ( #18486 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 23:11:03 +02:00
Adrian Wälchli
6ab443f9c8
Lazily import dependencies for MLFlowLogger ( #18528 )
2023-09-12 09:17:57 -04:00
Adrian Wälchli
c959df74b8
Support saving and loading stateful objects in Fabric ( #18513 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 07:58:52 -04:00
Adrian Wälchli
e958c6f1dc
More robust FabricOptimizer/LightningOptimizer wrapping logic ( #18516 )
2023-09-12 09:30:44 +02:00
qunhong zneg
a0d6fcc212
Make the batch_idx argument optional in all step methods ( #18512 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-09-11 14:39:38 +02:00
Adrian Wälchli
4dfc09c2fe
Change auto-device selection for Jupyter notebook environments ( #18291 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-08 11:49:31 -04:00
Carlos Mocholí
756e481969
Support the TransformerEngine precision plugin with the Trainer ( #18459 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-09-07 19:21:00 +02:00
Carlos Mocholí
e1c5c5ae4a
[TPU] Set the compute_dtype with XLAFSDP ( #18497 )
2023-09-07 18:43:21 +02:00
Adrian Wälchli
cc18bd781e
Refactor assertions that use walrus ( #18496 )
2023-09-07 11:49:04 -04:00
Adrian Wälchli
8381ed37c7
Set limits for `fetcher.done` ( #18441 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-09-07 10:46:49 -04:00
Adrian Wälchli
cf437ed7c6
Refresh `_FabricOptimizer.__dict__` when loading a state dict ( #18488 )
2023-09-06 11:51:42 -04:00
Adrian Wälchli
2d10dc4dc8
Fix progress bar display `v_num` when running with fast-dev-run ( #18491 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-06 11:51:14 -04:00