Commit Graph

1908 Commits

Author SHA1 Message Date
awaelchli 13f15b38fc
Support consolidating sharded checkpoints with the `fabric` CLI (#19560) 2024-03-04 08:01:33 -05:00
awaelchli d9113b61cc
Add additional references in compile guides (#19550) 2024-03-04 08:00:50 -05:00
Leng Yue 34f036917d
Document `ddp_find_unused_parameters_true` in Fabric (#19564) 2024-03-04 06:10:38 -05:00
awaelchli 48c39ce24f
Compile guide for Trainer (#19531)
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2024-02-28 09:15:33 -05:00
awaelchli abae4c903b
Update Lightning AI multi-node guide (Trainer) (#19530)
* update

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* configure_model

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-02-28 08:35:53 -05:00
awaelchli ea89133c65
Rename `fabric run model` to `fabric run` (#19527) 2024-02-27 11:36:46 -05:00
awaelchli e461e90f84
Update the Multi-GPU docs (#19525) 2024-02-26 22:29:26 -05:00
Jirka Borovec cf3553cdb5
docs: enable Sphinx linter & fixing (#19515)
* docs: enable Sphinx linter
* fixes
2024-02-26 16:20:33 +01:00
thomas chaton e43820a4be
migrate Data subpackage (#19523)
* update

* update

* update

* update

* Update checkgroup.yml

* More

* Add note

* Labeller should be kept as long as we have the stubs

* update

* update

* update

* Apply suggestions from code review

* init

* ci fix

* pin version range

* https://www.neptune.ai/

---------

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2024-02-26 08:25:00 -05:00
awaelchli 2a827f3f6f
Docs fixes (#19529) 2024-02-26 12:06:08 +01:00
Mauricio Villegas 623ec5824f
`load_from_checkpoint` support for LightningCLI when using dependency injection (#18105) 2024-02-23 10:55:07 +01:00
awaelchli c5ab34876b
Document optional steps for converting Fabric code (#19486) 2024-02-18 00:37:35 +01:00
Jirka Borovec 5998dd12e8
docs: ignore mall behave link (#19488) 2024-02-16 17:48:51 +01:00
PL Ghost 61ba180e5f
docs: Bump HPU ref `1.4.0` (#19484)
Co-authored-by: jerome-habana <jerome-habana@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-02-16 16:28:16 +01:00
awaelchli 120c87f8f7
Include the training mode in the ModelSummary (#19468) 2024-02-15 15:13:35 -05:00
awaelchli 59e45d6f6d
Update `all_gather` docs (#19469) 2024-02-14 19:37:50 +01:00
awaelchli 4bcc4f1cf7
Document the return value of `Fabric.clip_gradients()` (#19457) 2024-02-13 11:32:47 +01:00
Sebastian Raschka 0a77aa2cc5
Some clarifications in the torch.compile Fabric docs (#19456) 2024-02-13 06:52:29 +01:00
awaelchli e950bb4828
Remove the Graphcore IPU integration (#19405)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2024-02-12 16:16:02 -05:00
Justus Schock 2ed7282f7c
Rename Lightning Fabric CLI (#19442)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-02-12 17:22:53 +01:00
Justus Schock 0acd5f9810
Rename Lightning App CLI (#19440) 2024-02-09 16:54:54 +01:00
awaelchli 9c8cd4ce68
Update to 2.3.0dev (#19430)
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-02-08 06:34:51 -05:00
Carlos Mocholí db2cc8a88e
Fix CLI docs typo (#19426) 2024-02-07 10:53:58 -05:00
Carlos Mocholí 78b7a39e72
Update throughput docs (#19415) 2024-02-06 16:26:10 -05:00
awaelchli 130b05fe0c
Fix dead link in docs (#19387) 2024-02-06 11:54:17 +01:00
awaelchli 9624aae07e
Support non-strict loading in Trainer (#19404) 2024-02-05 19:57:43 -05:00
awaelchli 89ff87def0
Reapply compile in `Fabric.setup()` by default (#19382) 2024-02-01 15:06:18 -05:00
Adam J. Stewart 509b2ca560
Docs: fix FSDP acronym (#19384) 2024-02-01 16:02:59 +01:00
awaelchli c346f4d159
Compile guide for Fabric (#19330) 2024-01-31 14:57:07 -05:00
Jirka Borovec 6421dd8d4f
precommit: drop Black in favor of Ruff (#19380) 2024-01-31 17:09:39 +00:00
awaelchli 1a59097ab2
Drop support for PyTorch 1.12 (#19300)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-01-26 11:44:24 -05:00
awaelchli 7cc79fe7ba
Reapply `torch.compile` in Fabric.setup() (#19280) 2024-01-23 21:17:41 -05:00
awaelchli 1faddcb24c
Update Lightning AI multi-node guide (#19324)
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
2024-01-23 18:23:49 -05:00
awaelchli b1127e3608
Utility to consolidate sharded checkpoints (#19213)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2024-01-23 17:15:22 -05:00
awaelchli 93c1ab0653
Dedicated docs page for distributed checkpoints (Trainer) (#19299) 2024-01-17 12:20:12 +01:00
awaelchli a4ecf8d5c8
Dedicated docs page for distributed checkpoints (#19287) 2024-01-16 08:44:10 -05:00
awaelchli 1bd27447d9
Fix typo in `log_graph` docs (#19254) 2024-01-10 19:00:27 +01:00
Carlos Mocholí a1dd9efcf7
Drop XLA XRT support (#19232)
* Drop XLA XRT support
* update test
* set launched
* update conftest
* xla available check
---------

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-01-10 18:39:20 +01:00
Jirka Borovec 79b082a2ba
docs: fix link to colossalai (#19263) 2024-01-10 18:11:27 +01:00
Jirka Borovec f62e312185
docs: update references to ext. integrations (#19248)
* drop stale projects
* hpu 1.3.0
* copy & index
* prune

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-01-09 13:12:53 +01:00
Carlos Mocholí 97469c600f
TransformerEngine fallback compute dtype (#19082)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-12-14 03:02:09 +01:00
Jirka Borovec e769e233aa
ci: build app's docs with RTFD (#19147)
* ci: build app's docs with RTFD
* set -ex
* .[app]
2023-12-13 07:37:36 +01:00
Jirka Borovec 7fee4ae3a6
docs: fix broken link awesome-panel (#19140) 2023-12-12 09:39:47 +00:00
Jirka Borovec 4af5ec0d34
docs: drop legacy links (#19116) 2023-12-05 19:02:02 +01:00
Carlos Mocholí 7d04de697e
Reorder `configure_model` (#19060) 2023-12-05 02:29:32 +01:00
Aliaksei Urbanski c5363af1c2
Fix the "our" word duplication in the docs (#19055)
There were a bit inconsistent sentences with
a word duplication issue in the documentation.

These changes also:
  * unify references to the Compatibility matrix
  * add a reference to the Compatibility matrix
    into PyTorch Lightning's Installation Docs
2023-12-01 11:28:29 -05:00
Woon-Ha Yeo 294db45294
Fix missing module alias in Trainer reference (#19089) 2023-12-01 10:58:39 -05:00
Adrian Wälchli 9d366046b9
Clarify requirements for `Trainer.fit(ckpt_path="last")` (#19066) 2023-11-27 11:03:45 -05:00
Adrian Wälchli b79b68481e
Clarify setup of optimizer when using `empty_init=True` (#19067) 2023-11-26 11:04:36 +01:00
Adrian Wälchli 90043798e4
Clarify `self.log(..., rank_zero_only=True|False)` (#19056)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-11-23 13:02:21 -05:00