lightning

Commit Graph

Author	SHA1	Message	Date
elmuz	cec6ae123d	Fix typo `scrict` -> `strict` in types.py (#19998 )	2024-06-20 10:57:35 -04:00
Etay Livne	1e83a1bd32	Check if CometLogger experiment is alive (#19915 ) Co-authored-by: Etay Livne <etay.livne@mobileye.com>	2024-06-18 13:15:12 -04:00
liambsmith	394c42aaf6	Fix callback call in Fabric Trainer example (#19986 )	2024-06-18 13:14:32 -04:00
awaelchli	c1af4d0527	Better graceful shutdown for KeyboardInterrupt (#19976 )	2024-06-16 10:43:42 -04:00
PL Ghost	b16e998a6e	Adding test for legacy checkpoint created with 2.3.0 (#19974 )	2024-06-16 09:37:39 -04:00
Samuel Larkin	bb511b0baf	Fix minor typo in Trainer's documentation (#19969 )	2024-06-13 18:26:46 -04:00
awaelchli	a42484cf8e	Fix failing app tests (#19971 )	2024-06-13 20:58:34 +01:00
awaelchli	f6fd046552	Release 2.3.0 (#19954 )	2024-06-11 12:38:56 -04:00
William Falcon	a97814af13	Update README.md	2024-06-11 11:01:22 -04:00
William Falcon	fa5da26e39	Update README.md (#19968 )	2024-06-11 10:04:51 -04:00
Alexander Jipa	06ea3a0571	Fix resetting epoch loop restarting flag in LearningRateFinder (#19819 )	2024-06-07 10:52:58 -04:00
Björn Barz	5fa32d95e3	Ignore parameters causing ValueError when dumping to YAML (#19804 )	2024-06-06 18:36:28 -04:00
Douwe den Blanken	4f96c83ba0	Sanitize argument-free object params before logging (#19771 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-06-06 14:51:48 -04:00
Bhavay Malhotra	a611de0c15	Removing numpy requirement from all files in examples/pytorch/domain_templates (#19947 )	2024-06-06 11:02:01 -04:00
Mario Vasilev	812ffdec84	Fix `save_last` type annotation for ModelCheckpoint (#19808 )	2024-06-05 20:24:45 -04:00
Liyang90	7668a6bf59	Flexible and easy to use HSDP setting (#19504 ) Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-06-05 20:15:03 -04:00
awaelchli	1a6786d682	Destroy process group in atexit handler (#19931 )	2024-06-05 19:31:43 -04:00
Gilles Peiffer	b9f215d7fd	Replace usage of `grep -P` with `perl` in `run_standalone_tests.sh` (#19942 )	2024-06-05 12:32:56 -04:00
Jirka Borovec	e0b7c04e63	ci/docs: enable dispatch build without warning as errors (#19948 )	2024-06-05 12:32:36 -04:00
Yurij Mikhalevich	5aadfa6250	fix(docs): fix broken link to ensure the docs can be built (#19941 ) * fix(docs): fix broken link to ensure the docs can be built * nit	2024-06-04 22:11:20 -04:00
awaelchli	8bfbe0c908	Fix strict loading from distributed checkpoints vs PyTorch nightly (#19946 ) * strict loading * docstring	2024-06-04 22:09:01 -04:00
Federico Berto	19f0fb978c	Set `_choose_auto_accelerator` to `staticmethod` (#19822 )	2024-06-04 21:12:27 -04:00
Alex Spies	351bec7625	Fix typo on `estimated_stepping_batches` property (#19847 )	2024-06-04 21:06:16 -04:00
Gilles Peiffer	785f15d148	Remove `numpy` dependencies in `src/lightning/pytorch` (#19841 )	2024-06-04 19:45:05 -04:00
Matthew Hoffman	bac82b83a8	Remove unknown `[metadata]` table from `pyproject.toml` (#19904 )	2024-06-04 19:43:18 -04:00
Gilles Peiffer	fd86ea7356	Fix typos in CONTRIBUTING.md (#19937 )	2024-06-03 21:20:01 +02:00
PL Ghost	a99a6d3af1	Adding test for legacy checkpoint created with 2.2.5 (#19806 )	2024-05-31 12:53:54 -04:00
awaelchli	427fdfaf6e	Update docstring for `self.log` about keys in distributed training (#19917 )	2024-05-30 19:47:48 +02:00
Ivan Yashchuk	dffc0f96ec	Update FlopCounterMode usage in throughput.py (#19926 ) `mods` argument is not needed anymore for `FlopCounterMode`: `ffe506e853/torch/utils/flop_counter.py (L595-L596)`	2024-05-30 12:14:56 -04:00
awaelchli	95d6b6b9da	Disable skipping training step in distributed training (#19918 )	2024-05-30 11:54:48 -04:00
awaelchli	5d7932546d	Update code owners file (#19925 ) update	2024-05-30 11:50:02 -04:00
awaelchli	014cdd84ed	Update code owners file (#19922 ) * update code owners * update * Update .github/CODEOWNERS Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> --------- Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>	2024-05-30 06:12:41 -04:00
awaelchli	98005bbed0	Add Studio badge to tensor parallel docs (#19913 )	2024-05-28 09:04:55 -04:00
awaelchli	896c2a656a	Error for unsupported precision types with ModelParallelStrategy (#19902 )	2024-05-23 13:43:46 -04:00
awaelchli	c09356db1e	(10/10) Support 2D Parallelism - Port Fabric docs to PL (#19899 )	2024-05-23 08:55:52 -04:00
awaelchli	7874cd08ec	[TPU] Fix test assertion error from artifacts (#19825 )	2024-05-23 07:11:28 -04:00
Jirka Borovec	e0d7ede643	docs: prune unused `linkcode` (#19897 )	2024-05-23 11:35:53 +02:00
awaelchli	414c86332e	(9/n) Support 2D Parallelism - Remaining Checkpoint Logic (#19888 ) Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-22 18:13:41 -04:00
Jirka Borovec	fa1126ea53	docs: fix link to CLIP (#19896 ) * docs: fix link to CLIP * www * ignore	2024-05-22 17:46:51 -04:00
awaelchli	341474aaac	(8/n) Support 2D Parallelism - 2D Parallel Fabric Docs (#19887 )	2024-05-22 13:47:55 -04:00
awaelchli	8fc7b4ae94	Remove the requirement for FSDPStrategy subclasses to only support GPU (#19894 )	2024-05-22 18:31:40 +02:00
awaelchli	987c2c4093	(7/n) Support 2D Parallelism - TP Fabric Docs (#19884 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2024-05-22 06:20:40 -04:00
awaelchli	7e87ce05c8	Fix state dict loading in bitsandbytes plugin when checkpoint is already quantized (#19886 ) * bugfix * add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * add chlog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-21 13:46:01 -04:00
Gilles Peiffer	b1bb3f3173	Update `LearningRateMonitor` docs and tests for `log_weight_decay` (#19805 )	2024-05-21 13:31:54 -04:00
awaelchli	d76feef0d6	Enable loss-parallel in example (#19882 )	2024-05-20 13:19:38 +02:00
awaelchli	82e6e61bea	Remove redundant code to set the device on the LightningModule (#19877 ) Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-20 06:29:37 +02:00
Luca Antiga	d5bf4b9ed3	[App] Extend retry to 4xx except 400, 401, 403, 404 (#19842 ) * Extend retry to 4xx except 400, 401, 403, 404 * Remove unused intersphinx mapping for app --------- Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2024-05-18 22:03:16 -04:00
awaelchli	c8059d7bfd	(6/n) Support 2D Parallelism - Trainer example (#19879 ) * Add 2D parallel example * replace with torchtitan code	2024-05-18 20:35:58 -04:00
awaelchli	32e241870b	(5/n) Support 2D Parallelism in Lightning Trainer (#19878 ) * ModelParallelStrategy for Lightning Trainer * mypy * import fix * fix torchscript errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs issue * fix test execution * Update src/lightning/pytorch/strategies/model_parallel.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2024-05-17 19:03:31 -04:00
awaelchli	1d0c6aae96	(4/n) Support 2D Parallelism - Loading optimizer states correctly (#19872 ) * Load optimizer state * move to utility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-17 17:17:32 -04:00

1 2 3 4 5 ...

10334 Commits All Branches Search

10334 Commits

All Branches