awaelchli
abae4c903b
Update Lightning AI multi-node guide (Trainer) ( #19530 )
...
* update
* update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* configure_model
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-28 08:35:53 -05:00
awaelchli
a6c0a31e57
Fix infinite recursion error in precision plugin graveyard ( #19542 )
2024-02-27 19:19:33 +01:00
awaelchli
7880c110e3
Alternative mechanism to detect missing `Fabric.backward()` call ( #19493 )
2024-02-27 17:57:32 +01:00
awaelchli
ea89133c65
Rename `fabric run model` to `fabric run` ( #19527 )
2024-02-27 11:36:46 -05:00
awaelchli
e461e90f84
Update the Multi-GPU docs ( #19525 )
2024-02-26 22:29:26 -05:00
Jirka Borovec
a89ea11799
lint: drop yesqa, covered with RUF100 ( #19532 )
...
* drop yesqa, covered with RUF100
* fixing
* flaky test_snap_shotting
* xfail test_lit_drive
* flaky test_connect_disconnect_local
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-26 19:54:13 +01:00
dependabot[bot]
0520d94c71
Bump codecov/codecov-action from 3 to 4 ( #19406 )
...
bump
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2024-02-26 11:19:10 -05:00
Jirka Borovec
e3b6af5e38
ci: install stable, follow-up `litdata` release ( #19533 )
2024-02-26 16:20:53 +01:00
Jirka Borovec
cf3553cdb5
docs: enable Sphinx linter & fixing ( #19515 )
...
* docs: enable Sphinx linter
* fixes
2024-02-26 16:20:33 +01:00
thomas chaton
e43820a4be
migrate Data subpackage ( #19523 )
...
* update
* update
* update
* update
* Update checkgroup.yml
* More
* Add note
* Labeller should be kept as long as we have the stubs
* update
* update
* update
* Apply suggestions from code review
* init
* ci fix
* pin version range
* https://www.neptune.ai/
---------
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2024-02-26 08:25:00 -05:00
awaelchli
2a827f3f6f
Docs fixes ( #19529 )
2024-02-26 12:06:08 +01:00
awaelchli
2e512d4b2e
Remove the Colossal AI integration ( #19528 )
2024-02-26 10:59:15 +01:00
Kyle Gorman
63188f95dd
Adds more robust timer duration parsing ( #19513 )
2024-02-24 01:47:23 +01:00
thomas chaton
0f4522cbde
Switch to new package name lightning_data -> litdata ( #19522 )
2024-02-23 20:43:41 +00:00
Rafał Jankowski
f2f3ef5d3d
Proper support for Remote Stop and Remote Abort with NeptuneLogger ( #19130 )
2024-02-23 20:33:17 +01:00
Matthias Weigand
0235543ffe
Make `CSVLogger(name: ...)` optional `str` ( #19518 )
2024-02-23 20:31:38 +01:00
awaelchli
a41528c2a6
Update tests for PyTorch 2.2.1 ( #19521 )
2024-02-23 13:11:34 -05:00
Mauricio Villegas
623ec5824f
`load_from_checkpoint` support for LightningCLI when using dependency injection ( #18105 )
2024-02-23 10:55:07 +01:00
thomas chaton
a6273d1787
Add Lightning Data + Update README ( #19512 )
2024-02-22 14:07:03 +00:00
thomas chaton
eb0bbde04f
Add support for using the streaming dataloader in map or optimize for large scale inference ( #19510 )
2024-02-22 13:37:27 +00:00
thomas chaton
4175e1aef3
Hot fix: Fix path resolution ( #19508 )
2024-02-21 16:53:42 +00:00
thomas chaton
39a86f8692
Resolve compression, add support for torchaudio ( #19503 )
2024-02-21 00:05:13 +00:00
thomas chaton
2394e2f7b5
Resolve s3 credentials wrongly defined ( #19506 )
2024-02-20 23:40:06 +00:00
awaelchli
c5ab34876b
Document optional steps for converting Fabric code ( #19486 )
2024-02-18 00:37:35 +01:00
thomas chaton
bb35e8e0d3
Add batch_size to map, optimize ( #19489 )
2024-02-16 20:54:39 +00:00
thomas chaton
bbc5488a62
Enable no op optimize ( #19490 )
2024-02-16 20:27:20 +00:00
thomas chaton
53ea76a75c
Prevent dataset to break if it already exists ( #19491 )
2024-02-16 20:04:46 +00:00
dependabot[bot]
ddf2ac4df9
Bump actions/cache from 3 to 4 ( #19323 )
...
Bumps [actions/cache](https://github.com/actions/cache ) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases )
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md )
- [Commits](https://github.com/actions/cache/compare/v3...v4 )
---
updated-dependencies:
- dependency-name: actions/cache
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-16 18:33:16 +01:00
Jirka Borovec
6497e36b3d
bump: transmission to use `neptune` only, drop `neptune-client` ( #19265 )
...
* bump: min version `neptune>=1.0.0`
* Apply suggestions from code review
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2024-02-16 17:57:10 +01:00
Jirka Borovec
5998dd12e8
docs: ignore mall behave link ( #19488 )
2024-02-16 17:48:51 +01:00
PL Ghost
61ba180e5f
docs: Bump HPU ref `1.4.0` ( #19484 )
...
Co-authored-by: jerome-habana <jerome-habana@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-02-16 16:28:16 +01:00
PL Ghost
5fbd7c39dc
Adding test for legacy checkpoint created with 2.2.0.post0 ( #19428 )
2024-02-15 15:47:31 -05:00
awaelchli
0e25b1d01a
Avoid warning when resuming mid-epoch checkpoint and using stateful dataloader ( #19475 )
2024-02-15 15:20:13 -05:00
awaelchli
120c87f8f7
Include the training mode in the ModelSummary ( #19468 )
2024-02-15 15:13:35 -05:00
awaelchli
19675473b6
Fix `log_every_n_steps` check in ThroughputMonitor ( #19470 )
2024-02-15 15:12:49 -05:00
Jirka Borovec
99fe6563ef
precommit: ruff-format ( #19434 )
...
* precommit: ruff-format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* manual update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* manual update
* order
* mypy
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* mypy
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-15 13:39:17 -05:00
thomas chaton
6cb5813a5e
Fix the number of nodes not defined properly ( #19482 )
2024-02-15 17:35:26 +00:00
thomas chaton
b28b673e68
Make StreamingDataLoader shuffle to set shuffle to datasets. ( #19481 )
2024-02-15 17:22:34 +00:00
thomas chaton
5c9a6fa072
Improve DataProcessor worker assignment ( #19480 )
2024-02-15 17:13:05 +00:00
thomas chaton
b024e7a73b
Better default for drop_last in a distributed setting ( #19478 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-02-15 18:11:45 +01:00
awaelchli
265025bd5d
Inform the user about a missing `fabric.backward()` call ( #19447 )
2024-02-14 17:49:11 -05:00
Carlos Mocholí
67459944ea
Avoid FSDP deprecations during save/load with newer torch versions ( #19463 )
...
* Avoid FSDP deprecations during save/load with newer torch versions
* Refactor
* Tests
2024-02-14 19:43:59 +01:00
awaelchli
59e45d6f6d
Update `all_gather` docs ( #19469 )
2024-02-14 19:37:50 +01:00
thomas chaton
1d04c10e2d
Update Readme 2/n ( #19466 )
...
update
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-02-14 14:10:18 +00:00
thomas chaton
517af7124e
Add number of uploaders ( #19473 )
2024-02-14 13:11:03 +00:00
thomas chaton
71f44775c9
Resolve boto3 unable to local credentials ( #19472 )
2024-02-14 10:54:01 +00:00
Sebastian Raschka
d61f6fecd4
Fix Fabric's ThroughputMonitor docs ( #19464 )
2024-02-13 14:59:25 -05:00
awaelchli
3fbc29ba21
Fix `CSVLogger` trying to append to file from previous run in same version folder ( #19446 )
2024-02-13 13:59:04 -05:00
thomas chaton
aa6e0850cf
Add Lightning Data README 1/N ( #19455 )
2024-02-13 17:50:54 +00:00
thomas chaton
b097a4df3f
Improve data processing to enable downloading LAOIN 400M ( #19452 )
2024-02-13 13:23:39 +00:00