Commit Graph

3455 Commits

Author SHA1 Message Date
Carlos Mocholí 21d8fbfb2f
Fix broken links after reverse mirror changes (#16600) 2023-02-01 20:00:44 +00:00
Carlos Mocholí b2b5598b31
Replace custom `AllGather` implementation (#11531) 2023-02-01 19:53:43 +00:00
Carlos Mocholí ef2a6088ff
Drop support for PyTorch 1.10 (#16492)
* Drop support for PyTorch 1.10

* CHANGELOG

* READMEs

* mypy

* ls

* New poplar version

* Fixed tests

* links

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* skip azure badges

* Table

* Matching dockerfiles

* Drop unnecessary channels and packages

* Push nightly

* Undo unrelated changes

* Revert "Push nightly"

This reverts commit 9618f737c4.

---------

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-02-01 14:09:12 -05:00
Adrian Wälchli 6a56586492
Make manual optimization mandatory for multiple optimizers (#16539)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-02-01 16:21:01 +00:00
Carlos Mocholí df09370827
Run `on_train_epoch_end` after the LM for callbacks that monitor (#16567) 2023-02-01 15:27:16 +01:00
Jirka Borovec cc49e4a31f
tests: switch imports for apps (#16554) 2023-02-01 11:07:00 +00:00
Jirka Borovec 34140c0603
move lightning_app >> lightning/app (#16553)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-02-01 06:29:16 +01:00
Carlos Mocholí dc298f2340
Drop support for Python 3.7 (#16579)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2023-02-01 01:36:42 +00:00
Carlos Mocholí c2b5b77e23
Ignore leaked XLA environment variables (#16582) 2023-01-31 22:33:52 +01:00
Jirka Borovec 95e3117a0c
ci/hotfix: replace SD with Flashy (#16584)
ci: replace SD with Flashy
2023-01-31 13:06:58 -05:00
Carlos Mocholí 9f8043e16f
Cascade SIGTERM to subprocesses (#16525) 2023-01-31 17:24:58 +01:00
Ethan Harris 3001c7c1d5
[App] Fix app name in URL (#16575) 2023-01-31 07:14:56 -05:00
Ethan Harris 0ec93f2125
[App] Update app URLs to latest format (#16568) 2023-01-30 20:52:46 +00:00
Adrian Wälchli c2aa8c9e10
Remove `optimizer_idx` from `toggle/untoggle_optimizer` methods (#16560)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-30 18:06:42 +00:00
Adrian Wälchli 8fc4fb18e6
Switch multi-optimizer tests to manual optimization (#16559) 2023-01-30 17:18:37 +00:00
Carlos Mocholí a78412f11d
Remove DataLoader serialization (under fault tolerance) (#16533) 2023-01-30 16:12:01 +01:00
Adrian Wälchli 8aca46a192
Remove `using_lbfgs` argument from `optimizer_step` module hook (#16538) 2023-01-30 12:49:35 +00:00
Adrian Wälchli 1008f313e8
Remove `on_tpu` argument from `optimizer_step` module hook (#16537)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-01-30 13:17:20 +01:00
Jirka Borovec dc38663a03
store: update API in messages (#16535) 2023-01-30 11:19:57 +00:00
Jirka Borovec d521f2bc99
store: mock/fixture home (#16536)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-01-30 12:02:15 +01:00
Carlos Mocholí 07cda8c94e
Ensure SIGTERM handlers other than ours can be added (#16534) 2023-01-29 03:14:38 +01:00
Carlos Mocholí 405bb75553
Move progress file (#16524) 2023-01-27 17:49:52 +00:00
Carlos Mocholí 226290cfc1
PyTorch 2.0 switched the `set_to_none` default (#16531)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-01-27 16:51:56 +00:00
Carlos Mocholí e74a8378b4
Remove result serialization (under fault tolerance) (#16516) 2023-01-27 17:41:37 +01:00
Adrian Wälchli b216a114a7
Decouple Tuner from Trainer (#16462)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-27 15:08:40 +00:00
Jirka Borovec 2251055be4
store: drop `requirements_file_path` (#16527) 2023-01-27 08:39:07 -05:00
Carlos Mocholí d562319a61
Make the `FaultToleranceCheckpoint` callback opt-in (#16512) 2023-01-27 13:02:14 +00:00
Kushashwa Ravi Shrimali d738ab17e6
Init: Models store API (#15811)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-01-27 12:27:04 +01:00
Carlos Mocholí c854e4d1e2
SIGTERM handling is now unrelated to fault tolerance (#16501) 2023-01-27 02:59:42 +00:00
Carlos Mocholí b2387136ba
Fix `torch.compile` tests (#16503) 2023-01-27 02:41:45 +00:00
Adrian Wälchli 23e71a880a
Fabric checkpointing 3/n: Implement missing `get_module_state_dict` for strategies (#16487) 2023-01-26 13:10:14 +00:00
Jirka Borovec 50fd12f841
fabric: test with tbX (#16511) 2023-01-26 12:52:02 +00:00
Adrian Wälchli c68cfd686e
Rename LiteMultiNode to FabricMultiNode (#16505) 2023-01-26 11:36:27 +00:00
Jirka Borovec f812cb8339
ci: move Flagships to GH (#16420)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-01-26 09:28:30 +00:00
Carlos Mocholí d78cf99176
Remove the "native" suffix from the codebase (#16490) 2023-01-25 14:09:09 +00:00
thomas chaton b8eaabe3c9
[App] Add interruptible work (#16399)
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-01-25 13:48:27 +00:00
Adrian Wälchli 96b7ed77e6
Enable more shorthand strategy names in the Fabric CLI (#16485) 2023-01-25 09:52:03 +00:00
Adrian Wälchli c87bb71fa8
Add `Fabric.all_reduce` (#16459) 2023-01-24 22:35:00 +00:00
Adrian Wälchli 7603dd09cb
Fabric checkpointing 2/n: DeepSpeed implementation (#16452)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-24 18:53:26 +01:00
Ethan Harris 4a802e00a8
[App] Add `lightning open` command (#16482) 2023-01-24 15:58:15 +00:00
Akihiro Nitta d6b62da4d5
[App] Wrap LightningClient methods with retry wrapper by default (#16382)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-01-23 18:30:23 +00:00
Carlos Mocholí 5891cdc940
Mark the loop classes as protected (#16445) 2023-01-23 16:30:13 +00:00
thomas chaton 404fc0c8b7
[App] Resolved root_folder not parsed properly (#16454)
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-01-23 15:03:54 +00:00
Carlos Mocholí 39b7cb80ca
Remove the FairScale integration (#16400)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-01-23 13:39:04 +00:00
Seppo Enarvi 9346151359
Two fixes for handling edge cases in MLflow logging (#16451)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-01-23 14:29:58 +01:00
Ethan Harris 04886ed7f1
[App] Refactor cloud dispatch and update to new API (#16456)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-01-23 12:41:33 +00:00
Adrian Wälchli 9faa25f86f
Test that connector defaults match the ones in Trainer/Fabric (#16463) 2023-01-23 05:09:45 -05:00
Nikhil Shenoy 81914c7167
LightningFabric: Error handling for accelerator="mps" and ddp strategy pairing (#16455)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-01-22 17:57:24 +00:00
Peutlefaire 6fd914f40b
Solved minor bug with MLFlow logger (#16418)
Resolves https://github.com/Lightning-AI/lightning/issues/16411
2023-01-20 00:15:32 +00:00
Carlos Mocholí d3de5c64d7
Remove the deprecated code in `pl.utilities.data` (#16440) 2023-01-20 01:03:55 +01:00