Commit Graph

10354 Commits

Author SHA1 Message Date
awaelchli 2254bfee19 cheat 2024-07-01 02:13:03 +02:00
awaelchli 76d315e6d9 upgrade xla version 2024-07-01 02:07:09 +02:00
awaelchli 14493c0685
Drop PyTorch 2.0 from the test matrix (#20009) 2024-06-30 18:02:00 -04:00
awaelchli 5636fe4a9c
CI: replace macOS-11 with macOS-14 (#20029) 2024-06-30 16:19:38 -04:00
PL Ghost 2524864b3c
Adding test for legacy checkpoint created with 2.3.1 (#20023) 2024-06-28 14:48:15 +02:00
PL Ghost fa5af16424
docs: Bump HPU ref `1.6.0` (#20026)
---------

Co-authored-by: jerome-habana <jerome-habana@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2024-06-28 14:47:45 +02:00
PL Ghost aa2da72ab9
docs: Bump HPU ref `1.5.0` (#19843)
* bumping HPU version -> (1.5.0)
* fix build warning
* the HPU also need some images
* Apply suggestions from code review

---------

Co-authored-by: jerome-habana <jerome-habana@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2024-06-28 14:28:01 +02:00
awaelchli b6be13ce30
Fix dependency issues with omegaconf and hydra (#20025) 2024-06-28 04:43:26 -04:00
awaelchli c2a96e88ba
Set development version for 2.4 (#20022) 2024-06-27 13:16:43 -04:00
Corwin Joy 967413a5b9
Add atomic save to checkpoint routine (#20011) 2024-06-27 16:19:01 +02:00
awaelchli 3f69134479
Fix seed in test to avoid interactions on global random state (#20014) 2024-06-27 15:29:13 +02:00
thomas chaton df0d462738
Add support for batch stop (#20017) 2024-06-26 17:20:10 +01:00
thomas chaton d53e107fb5
Scale mmt (#19984) 2024-06-26 11:53:41 +01:00
awaelchli d0d01d3ff9
Fix package build dependencies (#20015) 2024-06-25 18:44:29 -04:00
dependabot[bot] 55b95f26ad
build(deps): bump docker/build-push-action from 5 to 6 (#20007)
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 5 to 6.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v5...v6)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-24 07:46:29 +02:00
awaelchli 9304a2c72e
Convert tensors to bytes instead of numpy in multiprocessing result-queue (#20005) 2024-06-23 19:36:57 +02:00
awaelchli e330da5870
Fix torch-numpy compatibility conflict in tests (#20004) 2024-06-21 20:20:59 -04:00
liambsmith 709a2a9d3b
Updated Fabric trainer example to not call `self.trainer.model` during validation (#19993) 2024-06-21 10:43:30 -04:00
Mauricio Villegas 5981aebfcc
Update `test_lightning_cli_help` for future change in jsonargparse (#20002) 2024-06-21 10:38:42 -04:00
SW Yoo d3a0ada4ff
Fix dtype for MPS in reinforcement learning example (#19982) 2024-06-21 10:36:10 -04:00
elmuz cec6ae123d
Fix typo `scrict` -> `strict` in types.py (#19998) 2024-06-20 10:57:35 -04:00
Etay Livne 1e83a1bd32
Check if CometLogger experiment is alive (#19915)
Co-authored-by: Etay Livne <etay.livne@mobileye.com>
2024-06-18 13:15:12 -04:00
liambsmith 394c42aaf6
Fix callback call in Fabric Trainer example (#19986) 2024-06-18 13:14:32 -04:00
awaelchli c1af4d0527
Better graceful shutdown for KeyboardInterrupt (#19976) 2024-06-16 10:43:42 -04:00
PL Ghost b16e998a6e
Adding test for legacy checkpoint created with 2.3.0 (#19974) 2024-06-16 09:37:39 -04:00
Samuel Larkin bb511b0baf
Fix minor typo in Trainer's documentation (#19969) 2024-06-13 18:26:46 -04:00
awaelchli a42484cf8e
Fix failing app tests (#19971) 2024-06-13 20:58:34 +01:00
awaelchli f6fd046552
Release 2.3.0 (#19954) 2024-06-11 12:38:56 -04:00
William Falcon a97814af13
Update README.md 2024-06-11 11:01:22 -04:00
William Falcon fa5da26e39
Update README.md (#19968) 2024-06-11 10:04:51 -04:00
Alexander Jipa 06ea3a0571
Fix resetting epoch loop restarting flag in LearningRateFinder (#19819) 2024-06-07 10:52:58 -04:00
Björn Barz 5fa32d95e3
Ignore parameters causing ValueError when dumping to YAML (#19804) 2024-06-06 18:36:28 -04:00
Douwe den Blanken 4f96c83ba0
Sanitize argument-free object params before logging (#19771)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2024-06-06 14:51:48 -04:00
Bhavay Malhotra a611de0c15
Removing numpy requirement from all files in examples/pytorch/domain_templates (#19947) 2024-06-06 11:02:01 -04:00
Mario Vasilev 812ffdec84
Fix `save_last` type annotation for ModelCheckpoint (#19808) 2024-06-05 20:24:45 -04:00
Liyang90 7668a6bf59
Flexible and easy to use HSDP setting (#19504)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2024-06-05 20:15:03 -04:00
awaelchli 1a6786d682
Destroy process group in atexit handler (#19931) 2024-06-05 19:31:43 -04:00
Gilles Peiffer b9f215d7fd
Replace usage of `grep -P` with `perl` in `run_standalone_tests.sh` (#19942) 2024-06-05 12:32:56 -04:00
Jirka Borovec e0b7c04e63
ci/docs: enable dispatch build without warning as errors (#19948) 2024-06-05 12:32:36 -04:00
Yurij Mikhalevich 5aadfa6250
fix(docs): fix broken link to ensure the docs can be built (#19941)
* fix(docs): fix broken link to ensure the docs can be built

* nit
2024-06-04 22:11:20 -04:00
awaelchli 8bfbe0c908
Fix strict loading from distributed checkpoints vs PyTorch nightly (#19946)
* strict loading

* docstring
2024-06-04 22:09:01 -04:00
Federico Berto 19f0fb978c
Set `_choose_auto_accelerator` to `staticmethod` (#19822) 2024-06-04 21:12:27 -04:00
Alex Spies 351bec7625
Fix typo on `estimated_stepping_batches` property (#19847) 2024-06-04 21:06:16 -04:00
Gilles Peiffer 785f15d148
Remove `numpy` dependencies in `src/lightning/pytorch` (#19841) 2024-06-04 19:45:05 -04:00
Matthew Hoffman bac82b83a8
Remove unknown `[metadata]` table from `pyproject.toml` (#19904) 2024-06-04 19:43:18 -04:00
Gilles Peiffer fd86ea7356
Fix typos in CONTRIBUTING.md (#19937) 2024-06-03 21:20:01 +02:00
PL Ghost a99a6d3af1
Adding test for legacy checkpoint created with 2.2.5 (#19806) 2024-05-31 12:53:54 -04:00
awaelchli 427fdfaf6e
Update docstring for `self.log` about keys in distributed training (#19917) 2024-05-30 19:47:48 +02:00
Ivan Yashchuk dffc0f96ec
Update FlopCounterMode usage in throughput.py (#19926)
`mods` argument is not needed anymore for `FlopCounterMode`:
ffe506e853/torch/utils/flop_counter.py (L595-L596)
2024-05-30 12:14:56 -04:00
awaelchli 95d6b6b9da
Disable skipping training step in distributed training (#19918) 2024-05-30 11:54:48 -04:00