Commit Graph

10188 Commits

Author SHA1 Message Date
awaelchli abae4c903b
Update Lightning AI multi-node guide (Trainer) (#19530)
* update

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* configure_model

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-28 08:35:53 -05:00
awaelchli a6c0a31e57
Fix infinite recursion error in precision plugin graveyard (#19542) 2024-02-27 19:19:33 +01:00
awaelchli 7880c110e3
Alternative mechanism to detect missing `Fabric.backward()` call (#19493) 2024-02-27 17:57:32 +01:00
awaelchli ea89133c65
Rename `fabric run model` to `fabric run` (#19527) 2024-02-27 11:36:46 -05:00
awaelchli e461e90f84
Update the Multi-GPU docs (#19525) 2024-02-26 22:29:26 -05:00
Jirka Borovec a89ea11799
lint: drop yesqa, covered with RUF100 (#19532)
* drop yesqa, covered with RUF100
* fixing
* flaky test_snap_shotting
* xfail test_lit_drive
* flaky test_connect_disconnect_local

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-26 19:54:13 +01:00
dependabot[bot] 0520d94c71
Bump codecov/codecov-action from 3 to 4 (#19406)
bump

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2024-02-26 11:19:10 -05:00
Jirka Borovec e3b6af5e38
ci: install stable, follow-up `litdata` release (#19533) 2024-02-26 16:20:53 +01:00
Jirka Borovec cf3553cdb5
docs: enable Sphinx linter & fixing (#19515)
* docs: enable Sphinx linter
* fixes
2024-02-26 16:20:33 +01:00
thomas chaton e43820a4be
migrate Data subpackage (#19523)
* update

* update

* update

* update

* Update checkgroup.yml

* More

* Add note

* Labeller should be kept as long as we have the stubs

* update

* update

* update

* Apply suggestions from code review

* init

* ci fix

* pin version range

* https://www.neptune.ai/

---------

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2024-02-26 08:25:00 -05:00
awaelchli 2a827f3f6f
Docs fixes (#19529) 2024-02-26 12:06:08 +01:00
awaelchli 2e512d4b2e
Remove the Colossal AI integration (#19528) 2024-02-26 10:59:15 +01:00
Kyle Gorman 63188f95dd
Adds more robust timer duration parsing (#19513) 2024-02-24 01:47:23 +01:00
thomas chaton 0f4522cbde
Switch to new package name lightning_data -> litdata (#19522) 2024-02-23 20:43:41 +00:00
Rafał Jankowski f2f3ef5d3d
Proper support for Remote Stop and Remote Abort with NeptuneLogger (#19130) 2024-02-23 20:33:17 +01:00
Matthias Weigand 0235543ffe
Make `CSVLogger(name: ...)` optional `str` (#19518) 2024-02-23 20:31:38 +01:00
awaelchli a41528c2a6
Update tests for PyTorch 2.2.1 (#19521) 2024-02-23 13:11:34 -05:00
Mauricio Villegas 623ec5824f
`load_from_checkpoint` support for LightningCLI when using dependency injection (#18105) 2024-02-23 10:55:07 +01:00
thomas chaton a6273d1787
Add Lightning Data + Update README (#19512) 2024-02-22 14:07:03 +00:00
thomas chaton eb0bbde04f
Add support for using the streaming dataloader in map or optimize for large scale inference (#19510) 2024-02-22 13:37:27 +00:00
thomas chaton 4175e1aef3
Hot fix: Fix path resolution (#19508) 2024-02-21 16:53:42 +00:00
thomas chaton 39a86f8692
Resolve compression, add support for torchaudio (#19503) 2024-02-21 00:05:13 +00:00
thomas chaton 2394e2f7b5
Resolve s3 credentials wrongly defined (#19506) 2024-02-20 23:40:06 +00:00
awaelchli c5ab34876b
Document optional steps for converting Fabric code (#19486) 2024-02-18 00:37:35 +01:00
thomas chaton bb35e8e0d3
Add batch_size to map, optimize (#19489) 2024-02-16 20:54:39 +00:00
thomas chaton bbc5488a62
Enable no op optimize (#19490) 2024-02-16 20:27:20 +00:00
thomas chaton 53ea76a75c
Prevent dataset to break if it already exists (#19491) 2024-02-16 20:04:46 +00:00
dependabot[bot] ddf2ac4df9
Bump actions/cache from 3 to 4 (#19323)
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-16 18:33:16 +01:00
Jirka Borovec 6497e36b3d
bump: transmission to use `neptune` only, drop `neptune-client` (#19265)
* bump: min version `neptune>=1.0.0`
* Apply suggestions from code review

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2024-02-16 17:57:10 +01:00
Jirka Borovec 5998dd12e8
docs: ignore mall behave link (#19488) 2024-02-16 17:48:51 +01:00
PL Ghost 61ba180e5f
docs: Bump HPU ref `1.4.0` (#19484)
Co-authored-by: jerome-habana <jerome-habana@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-02-16 16:28:16 +01:00
PL Ghost 5fbd7c39dc
Adding test for legacy checkpoint created with 2.2.0.post0 (#19428) 2024-02-15 15:47:31 -05:00
awaelchli 0e25b1d01a
Avoid warning when resuming mid-epoch checkpoint and using stateful dataloader (#19475) 2024-02-15 15:20:13 -05:00
awaelchli 120c87f8f7
Include the training mode in the ModelSummary (#19468) 2024-02-15 15:13:35 -05:00
awaelchli 19675473b6
Fix `log_every_n_steps` check in ThroughputMonitor (#19470) 2024-02-15 15:12:49 -05:00
Jirka Borovec 99fe6563ef
precommit: ruff-format (#19434)
* precommit: ruff-format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* manual update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* manual update

* order

* mypy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mypy

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-15 13:39:17 -05:00
thomas chaton 6cb5813a5e
Fix the number of nodes not defined properly (#19482) 2024-02-15 17:35:26 +00:00
thomas chaton b28b673e68
Make StreamingDataLoader shuffle to set shuffle to datasets. (#19481) 2024-02-15 17:22:34 +00:00
thomas chaton 5c9a6fa072
Improve DataProcessor worker assignment (#19480) 2024-02-15 17:13:05 +00:00
thomas chaton b024e7a73b
Better default for drop_last in a distributed setting (#19478)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-02-15 18:11:45 +01:00
awaelchli 265025bd5d
Inform the user about a missing `fabric.backward()` call (#19447) 2024-02-14 17:49:11 -05:00
Carlos Mocholí 67459944ea
Avoid FSDP deprecations during save/load with newer torch versions (#19463)
* Avoid FSDP deprecations during save/load with newer torch versions

* Refactor

* Tests
2024-02-14 19:43:59 +01:00
awaelchli 59e45d6f6d
Update `all_gather` docs (#19469) 2024-02-14 19:37:50 +01:00
thomas chaton 1d04c10e2d
Update Readme 2/n (#19466)
update

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-02-14 14:10:18 +00:00
thomas chaton 517af7124e
Add number of uploaders (#19473) 2024-02-14 13:11:03 +00:00
thomas chaton 71f44775c9
Resolve boto3 unable to local credentials (#19472) 2024-02-14 10:54:01 +00:00
Sebastian Raschka d61f6fecd4
Fix Fabric's ThroughputMonitor docs (#19464) 2024-02-13 14:59:25 -05:00
awaelchli 3fbc29ba21
Fix `CSVLogger` trying to append to file from previous run in same version folder (#19446) 2024-02-13 13:59:04 -05:00
thomas chaton aa6e0850cf
Add Lightning Data README 1/N (#19455) 2024-02-13 17:50:54 +00:00
thomas chaton b097a4df3f
Improve data processing to enable downloading LAOIN 400M (#19452) 2024-02-13 13:23:39 +00:00