lightning

Commit Graph

Author	SHA1	Message	Date
Boris Dayma	1e36cffbca	feat(wandb): support distributed modes (#11650 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2022-02-09 19:53:21 +01:00
Carlos Mocholí	8394770d4a	Move data fetcher ownership to the loops (#11621 )	2022-02-09 20:04:24 +05:30
Biho-Kim	24de29974c	bug fix #10872 (#10965 ) Co-authored-by: louie.kim <louie.kim@kakaocorp.comlouie.kim@kakaocorp.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-02-09 14:15:49 +00:00
Carlos Mocholí	8822117200	Return the output of the optimizer step (#11711 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2022-02-09 09:37:13 +00:00
Danielle Pintz	9e63281a4c	remove todos (#11804 )	2022-02-09 08:30:27 +00:00
ananthsub	9d4de3a863	Faster callback configuration validator checks (#11785 ) Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2022-02-09 08:24:14 +00:00
Rohit Gupta	182c18d319	Configure native deepspeed schedulers with interval='step' (#11788 )	2022-02-09 08:20:50 +00:00
jjenniferdai	1203094a20	Introduce `Stateful` DataModule (#11637 )	2022-02-07 21:13:24 +01:00
circlecrystal	43a89eb132	bug fix: restore_optimizers correctly handles non-mapping values in optimizer.state.values() (#11757 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-02-07 14:55:06 +00:00
Rohit Gupta	9ed44dee0d	Fix to avoid moving batch to device for DataParallel (#11780 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2022-02-07 14:26:18 +00:00
Rohit Gupta	581bf7f2f2	Deprecate `on_epoch_start/on_epoch_end` hook (#11578 )	2022-02-07 14:15:27 +00:00
ananthsub	bbf27ed09a	Use fsspec in checkpoint connector for fault-tolerant training (#11776 )	2022-02-07 13:29:41 +01:00
ananthsub	0ba25d3cac	Update DDPStrategy to use optimizers property from within class (#11777 )	2022-02-07 13:28:37 +01:00
Rohit Gupta	7ec1e66e17	reduce only loss with dp (#11594 ) Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-02-07 17:00:29 +05:30
Krishna Kalyan	f509e40ae3	Deprecate `on_before_accelerator_backend_setup` callback hook (#11655 ) Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2022-02-07 11:07:21 +00:00
ananthsub	a64438c897	Centralize rank_zero_only utilities into their own module (#11747 ) * Centralize rank_zero_only utilities into their own module Fixes #11746 * PossibleUserWarning * Update test_warnings.py * update imports * more imports * Update CHANGELOG.md * Update mlflow.py * Update cli.py * Update api_references.rst * Update meta.py * add deprecation tests * debug standalone * fix standalone tests * Update CHANGELOG.md	2022-02-07 08:09:55 +00:00
Danielle Pintz	34c454c756	Small improvements to TB and CSV loggers (#11764 ) * small improvements to TB and CSV loggers * addr comments * remove redundant lines and update tests Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Aki Nitta <nitta@akihironitta.com>	2022-02-07 14:59:39 +09:00
ananthsub	7900aabe62	Keep `is_global_zero` definitions in sync across strategy and trainer (#11761 )	2022-02-07 01:33:32 +05:30
ananthsub	dfda970572	Update TPU Spawn to use root_device instead of LightningModule's device (#11750 )	2022-02-06 06:26:38 +00:00
Dan Dale	9d8faecdb2	Allow Horovod `teardown()` to complete gracefully if exception thrown in callback setup (#11752 )	2022-02-05 11:13:21 -08:00
ananthsub	819a747031	Use `root_device` in XLAStatsMonitor callback (#11749 )	2022-02-05 10:09:08 -08:00
ananthsub	7d9454a3e9	Use `root_device` in DeviceStatsMonitor callback (#11748 ) * Use trainer.strategy.root_device in favor of LightningModule.device in DeviceStatsMonitor Minor refactor to use the strategy's own `root_device` instead of the LightningModule's device property. Attempts at manual model parallelization by extending this plugin will face difficulties with the assumption that the LightningModule has all of its parameters on the same device. For those use cases, it is critical to remove the assumption that the module has a device property (device in general goes against PyTorch module's design principles: - https://github.com/pytorch/pytorch/issues/7460 - https://github.com/PyTorchLightning/pytorch-lightning/pull/1790#discussion_r423459412	2022-02-05 11:20:15 +01:00
ananthsub	241c97e6eb	Update HorovodStrategy to use optimizers property from within class (#11728 )	2022-02-05 10:04:55 +01:00
Adrian Wälchli	cc43d07db1	Remove legacy dead code in DDP script launch (#11678 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-02-05 11:40:16 +05:30
Dan Dale	3bc2407239	Allow access to ckpt_path within context of fit() (#11696 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-05 05:23:16 +01:00
Carlos Mocholí	7da931d1ca	Support no pre-fetching (#11606 )	2022-02-05 03:59:46 +00:00
Danielle Pintz	c71a1d7ea2	Remove `self._log_dir` from `BaseProfiler` (#11740 )	2022-02-05 04:45:48 +01:00
ananthsub	72db64d294	Use the strategy's `root_device` instead of the LightningModule's device property (#11734 )	2022-02-05 04:33:25 +01:00
Andres Algaba	58324b5197	Improve the result printing at the end of evaluation (#11332 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-02-05 03:03:22 +01:00
NathanGodey	8a1b1eeef8	WandbLogger's log_image can use step argument (#11716 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-05 01:02:41 +00:00
wangraying	8c07d8bf90	Add `Trainer(strategy="bagua")` (#11146 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Sean Naren <sean@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2022-02-04 17:02:09 +00:00
ananthsub	2eca957b29	Minor refactors to `init_dist_connection` (#11733 )	2022-02-04 13:33:49 +01:00
Rohit Gupta	4d72110b51	Deprecate `on_batch_start/on_batch_end` callback hooks (#11577 )	2022-02-03 19:51:56 +00:00
Rohit Gupta	400201712f	added warning for distributedsampler in case of evaluation (#11479 )	2022-02-03 18:42:13 +00:00
Rohit Gupta	01abe72278	Fix to avoid val progress bar disappear after validate (#11700 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-03 13:35:38 +00:00
Rohit Gupta	e9065e9d42	Fix rich with uneven refresh rate tracking (#11668 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-03 10:27:05 +00:00
Rohit Gupta	7948ed703d	Avoid enforcing `shuffle=False` for eval dataloaders (#11575 )	2022-02-03 09:35:31 +00:00
Danielle Pintz	9ebd7df22a	Move progress bar disabling out of the Trainer (#11377 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-02-03 06:29:32 +00:00
Rohit Gupta	0cb64fb8ba	Fix mid-epoch warning call while resuming (#11556 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-02-03 05:42:31 +00:00
four4fish	d43fd0d4d6	Lazy initialize Strategy.parallel_devices (#11572 ) Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-03 04:25:16 +00:00
Rohit Gupta	eceefdc602	Fix rich progress bar render only on main pbar (#11690 )	2022-02-03 04:18:07 +00:00
Krishna Kalyan	6291af5c19	Replace occurrences of `on_before_accelerator_backend_setup_called` with `setup` (#11568 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-02-03 04:14:33 +00:00
Peter Franek	ed8a5dadce	Improving instructions in finetuning docstring (#10484 )	2022-02-03 04:13:06 +00:00
Anton Schwaighofer	f935319622	Allow a `CombinedLoader` as the training data in DDP (#11648 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-02-03 04:01:20 +00:00
Sebastian Raschka	0e17f16438	Clarify what the default values for log are based on hooks (#11611 )	2022-02-03 03:55:42 +00:00
Jirka Borovec	c5de105276	fix available modules (#11526 )	2022-02-03 03:38:16 +00:00
Sebastian Raschka	9934569373	Fix typo in `TensorBoardLogger.log_metrics` error message (#11595 )	2022-02-03 03:18:54 +00:00
Carlos Mocholí	3d3172d3da	[CLI] Support shorthand for loggers (#11533 )	2022-02-03 02:58:14 +00:00
Bhadresh Savani	0ea48416cd	Removed subsection in `LightningDataModule` (#11675 )	2022-02-03 02:53:43 +00:00
DuYicong515	0816a1997e	Add typing for utilities/memory.py (#11545 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-02-03 02:34:05 +00:00
Piyush Hirapara	72f0e5bfae	Deprecate `on_configure_sharded_model` callback hook for v1.6 (#11627 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-02-03 02:29:26 +00:00
Krishna Kalyan	6586dd23b7	Mark `CheckpointConnector` as protected (#11550 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-03 02:26:08 +00:00
DuYicong515	06e2635c71	Refactor get_filesystem to use native fsspec API (#11708 )	2022-02-03 01:55:24 +00:00
Akash Kwatra	d5aa7717aa	Remove experiment property from abstract class (#11603 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-03 01:51:34 +00:00
Rohit Gupta	ee049e123d	Fix rich progress bar metric render on epoch end (#11689 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-02-03 01:43:48 +00:00
jjenniferdai	ec1379da2c	Rename `_SupportsStateDict` --> `_Stateful` Protocol (#11469 )	2022-02-02 23:45:59 +01:00
Carlos Mocholí	b8e360dafa	[CLI] Fix bug that forces overriding `configure_optimizers` (#11672 )	2022-02-02 22:44:00 +00:00
Akash Kwatra	115a5d08e8	Decouple utilities from `LightningLoggerBase` (#11484 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-02-02 23:29:01 +01:00
Aki Nitta	fbc1f9f1d9	Rename `Strategy.lr_schedulers` to `Strategy.lr_scheduler_configs` (#11549 )	2022-02-02 22:10:01 +00:00
Nithin Rao	b8d2c65a37	Set the state before saving "last" or "none" checkpoints (#11481 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-02-02 23:07:05 +01:00
Carlos Mocholí	d7944a13cd	Teardown all internal components on exception (#11620 )	2022-02-02 21:10:19 +00:00
Rohit Gupta	3eee8f18cf	Sort simple profiler summary based on mean duration (#11671 )	2022-02-02 20:44:42 +00:00
Rohit Gupta	76175217e4	Fix val_loop run on restart (#11552 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-02 20:19:34 +00:00
Carlos Mocholí	a44881cd90	Changes in preparation to #8578 (#11562 )	2022-02-02 19:57:08 +00:00
Carlos Mocholí	79a3ff690b	Add typing to data fetching (#11515 )	2022-02-02 20:53:50 +01:00
Chunyang Wen	fe34bf2a65	Remove useless pass and abc (#11522 )	2022-01-24 08:19:57 +00:00
Chunyang Wen	350c88e621	Let Accelerator inherit from ABC to make sure abstractmethod takes effect (#11521 )	2022-01-23 20:47:43 +01:00
Carlos Mocholí	623dc974f5	Construct the hook kwargs inside each loop (#11511 )	2022-01-22 15:57:12 +00:00
Carlos Mocholí	5ad5ba54c0	Refactor fetching function (#11516 )	2022-01-20 20:06:58 +01:00
Carlos Mocholí	075b8801c9	Fix checkpoint values when saving and resetting the tuner state (#11518 )	2022-01-20 18:54:40 +00:00
Carlos Mocholí	7295457a7b	[CLI] Save only the configuration used (#11532 )	2022-01-20 12:35:43 +00:00
Rafał Jankowski	e78d658c8d	Remove access to `_short_id` in NeptuneLogger (#11517 )	2022-01-20 12:07:42 +00:00
Maaz Karim	16a04b29eb	Mark SignalConnector as protected (#11513 ) Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2022-01-20 08:39:59 +01:00
ananthsub	1bd6fc979e	Remove `Strategy.on_tpu` property (#11536 )	2022-01-20 08:25:26 +01:00
ananthsub	f41d1e5e5e	Remove `Strategy.on_gpu` (#11537 )	2022-01-19 21:27:12 +00:00
Rohit Gupta	f7f835fa0e	improve simple profiler output (#11414 )	2022-01-18 19:58:34 +00:00
Carlos Mocholí	62818dbace	Use a dataclass as the scheduler config (#11443 )	2022-01-18 20:23:32 +01:00
Carlos Mocholí	344ab1e0a5	Move the `lightning_optimizers` ownership to the `Strategy` (#11444 )	2022-01-18 12:58:56 +01:00
Rohit Gupta	033dba1494	Disable attaching samplers when using `IterableDataset` (#11507 )	2022-01-17 23:33:57 +01:00
Gautam R Gare	ef4677ae7b	Change the default `prog_bar=False` to `True` in `LightningModule.log_grad_norm` (#11472 ) * Reset on_step flag to True in log_grad_norm * updated change log Co-authored-by: Aki Nitta <nitta@akihironitta.com>	2022-01-18 02:34:50 +09:00
Carlos Mocholí	9cf9ded73b	Simplify data fetching (#11466 )	2022-01-17 14:46:55 +00:00
Rohit Gupta	cad604211b	update load_from_checkpoint docstrings (#11467 )	2022-01-16 20:48:27 +00:00
Carlos Mocholí	18bbb39eef	Set `Loop.restarting` recursively (#11442 ) * Set `Loop.restarting` recursively * Docs * CHANGELOG * Update pytorch_lightning/loops/epoch/training_epoch_loop.py Co-authored-by: Aki Nitta <nitta@akihironitta.com>	2022-01-14 19:25:23 +09:00
Rohit Gupta	9771e7dff6	Update introduction docs (#11140 )	2022-01-13 21:11:43 +00:00
Carlos Mocholí	a80da35d5d	Fix compatibility with old checkpoints and fault-tolerance enabled (#11439 )	2022-01-13 14:53:17 +01:00
Rohit Gupta	96a53382ac	Update utilities API references (#11450 )	2022-01-13 13:22:58 +00:00
Carlos Mocholí	5914fb748f	Add typing to accelerators/gpu.py (#11333 )	2022-01-12 19:44:51 +00:00
Rohit Gupta	00d1758bac	Update training tricks docs (#11169 )	2022-01-12 16:26:03 +00:00
Carlos Mocholí	f5bbc2cf17	Avoid in-place ops during logging result updates (#11401 ) Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2022-01-12 09:09:36 +01:00
Rohit Gupta	221091afc4	move profiler docs (#11431 )	2022-01-12 05:56:16 +00:00
Aki Nitta	8dc36c3745	Fix inconsistent exceptions raised with no `rich` installed (#11360 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-01-12 03:55:51 +00:00
Rohit Gupta	82c8875f33	Add `LightningModule.lr_scheduler_step` (#10249 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-01-12 03:53:49 +00:00
Carlos Mocholí	9771040621	Add typing to `TQDMProgressBar` (#11369 )	2022-01-12 01:07:30 +00:00
edward-io	6107ce8e0d	Add DETAIL logs for batch use cases (#11008 )	2022-01-12 01:22:48 +01:00
Rohit Gupta	06b8f82b8a	Update API references in doc (#11357 )	2022-01-07 15:56:17 +01:00
Carlos Mocholí	59a7ba7605	Move `epoch_{start,end}` hooks from `TrainingEpochLoop` to `FitLoop` (#11201 )	2022-01-06 15:13:18 +00:00
Danielle Pintz	57567edeab	Move newly added Trainer methods to be with other methods (#11335 )	2022-01-06 14:10:21 +00:00
Kaushik B	42a1c72660	Add Accelerators section to Lightning docs (#10755 )	2022-01-06 19:12:44 +05:30
Carlos Mocholí	8a549a550c	Integrate progress tracking into the progress bar (#11213 )	2022-01-06 14:29:48 +01:00
Adrian Wälchli	3a2df4f75d	Fix typing in `pl.callbacks.xla_stats_monitor` (#11219 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-01-06 12:51:02 +00:00
NathanGodey	9b873dcfcc	Changed hook doctstring (#11345 )	2022-01-06 12:37:11 +00:00
Adrian Wälchli	9c8f52ccd1	Fix restoring lr scheduler states with deepspeed strategy (#11322 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2022-01-06 12:34:16 +00:00
Carlos Mocholí	5693a94c32	Extend the deprecation of `Trainer(resume_from_checkpoint)` (#11334 )	2022-01-06 13:18:37 +01:00
Kaushik B	e15579a4f3	Rename `_distrib_type` to `_strategy_type` (#11328 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-01-06 06:32:50 +00:00
Abhishek Saroha	43c140c8e5	Fix frozen dataclass instance error in `apply_to_collection` (#10927 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-01-05 23:29:03 +01:00
Danielle Pintz	5b59c951e2	Deprecate `TrainerDataLoadingMixin` and move logic to `DataConnector` (#11282 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-01-05 21:23:57 +01:00
Carlos Mocholí	c0726bacbe	Update `LightningCLI(trainer_defaults=...)` doc (#11309 ) Co-authored-by: Mauricio Villegas <mauricio_ville@yahoo.com>	2022-01-05 19:43:35 +00:00
Adrian Wälchli	9906a1a54d	Update optimizer configuration info message in `DeepSpeedStrategy` (#11327 )	2022-01-05 18:20:06 +00:00
Carlos Mocholí	1b6f851880	Add typing to some utility files (#11316 )	2022-01-05 17:14:22 +00:00
Kaushik B	70c975a9f3	Fix exception message for FSDP running on CPU (#11325 )	2022-01-05 18:02:31 +01:00
Rohit Gupta	8955081aaf	Update precision docs (#11010 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2022-01-05 14:58:04 +00:00
Andrew Tritt	dbf1acd5a5	Modify LSFEnvironment to use more reliable environment variable (#10825 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-01-05 12:45:25 +00:00
Kaushik B	93223ff5ce	Introduce StrategyRegistry (#11233 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-01-05 17:14:18 +05:30
Carlos Mocholí	5ac129e95a	Rename ttp -> strategy (#11312 )	2022-01-05 12:12:25 +01:00
Carlos Mocholí	33c3490685	Fix min/max logging default value (#11310 )	2022-01-05 11:42:03 +01:00
Adrian Wälchli	a8bd7ac73f	Fix lr scheduler state not being dumped to checkpoint in deepspeed strategy (#11307 )	2022-01-05 08:38:08 +00:00
Rohit Gupta	7eab379da2	Raise a warning if evaulation is triggered with best ckpt in case of multiple checkpoint callbacks (#11274 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-01-04 17:22:32 +00:00
Carlos Mocholí	a610e043d7	Add typing for utilities/enums.py (#11298 )	2022-01-04 13:30:56 +01:00
Carlos Mocholí	e9009d6058	Reset the total fit-validation batch progress on epoch (#11244 )	2022-01-04 12:04:20 +01:00
Danielle Pintz	7fa1aebcc9	Remove `profile("training_step_and_backward")` (#11222 )	2022-01-04 11:50:11 +01:00
Rohit Gupta	997da52f73	Update logic to make sure logged_metrics always contain tensors (#11270 )	2022-01-04 10:32:44 +00:00
Rohit Gupta	98ea79b8b0	Add `opt_idx` to scheduler config if not assigned by user (#11247 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-01-04 14:57:15 +09:00
Ed Pizzi	cf32127e7e	Avoid non-blocking GPU->CPU copies. (#11288 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-01-03 22:17:50 +00:00
Kaushik B	5a4df4ec7d	Update strategy import statements (#11238 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-01-03 15:54:46 +00:00
ananthsub	05ed9a201c	Group metrics generated by `DeviceStatsMonitor` for better visualization (#11254 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-01-03 13:26:17 +00:00
Adrian Wälchli	17cb3c70f7	Fix data fetcher selection (#11294 )	2022-01-03 13:49:17 +01:00
Danielle Pintz	b082715103	Remove `Strategy.optimizer_zero_grad` (#11246 )	2022-01-03 13:46:57 +01:00
Adrian Wälchli	4eede7c30b	Add deprecation path for renamed training type plugins (#11227 ) Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-01-03 13:41:05 +01:00
jjenniferdai	4b5761539e	Remove `hpc_save` (#11101 )	2022-01-03 12:23:13 +00:00
Adam Viola	1fc046cde2	Fix `_should_reload_dl_epoch` causing inconsistent validation dataloader reloading (#11036 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-28 02:20:57 +01:00
Danielle Pintz	ca9b25db80	Remove `Strategy.init_optimizers` (#11236 )	2021-12-23 18:48:21 +00:00
Danielle Pintz	ba6a8ddcad	refactor _configure_schedulers (#11245 )	2021-12-23 10:03:28 -08:00
Carlos Mocholí	f44b209e72	Fix CLI race condition saving the config (#11199 )	2021-12-23 16:45:06 +00:00
Carlos Mocholí	30236c837f	Reset the progress tracking state after sanity checking (#11218 )	2021-12-23 16:36:03 +00:00
Kaushik B	0adcd6a048	Rename training_type_plugin file to strategy (#11239 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-23 14:01:23 +00:00
Adrian Wälchli	c210e338ef	Update strategy import statements (#11231 )	2021-12-23 08:26:28 +01:00
Danielle Pintz	a6a28e08d2	Deprecate `TrainerOptimizersMixin` and move functionality to `core/optimizer.py` (#11155 )	2021-12-22 17:56:37 -08:00
four4fish	81301dbba7	Rename `AcceleratorConnector.training_type_plugin` to `AcceleratorConnector.strategy` (#11212 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-23 01:36:23 +00:00
twsl	0b9034baef	Return only unique names/versions for LoggerCollection (#10976 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-12-23 00:35:38 +00:00
Kaushik B	576a5d62a0	Introduce strategies directory for Training Strategies (#11226 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-22 20:23:30 +00:00
Carlos Mocholí	eb5b350f9a	Remove explicit isinstance checks in strategies for checkpoint io (#11177 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-22 04:41:45 +00:00
Adrian Wälchli	b6dd1a3878	Fix typing in `pl.callbacks.lr_monitor` (#10802 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-12-22 03:50:00 +00:00
Adrian Wälchli	ba8e7cd787	Fix BF16 teardown for TPU precision plugin (#10990 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-12-22 03:47:14 +00:00
four4fish	cf5ef32f7b	Deprecate Trainer.training_type_plugin in favor of trainer.strategy (#11141 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-22 02:11:43 +00:00
Adrian Wälchli	17ad1a4c00	Rename `ParallelPlugin` to `ParallelStrategy` (#11123 )	2021-12-22 01:09:17 +00:00
four4fish	4bfe5bda0f	Rename the DDPSpawnShardedPlugin to DDPSpawnShardeedStrategy (#11210 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-22 00:27:36 +00:00
Aki Nitta	28ce9105e4	Rename `SingleDevicePlugin` to `SingleDeviceStrategy` (#11181 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-21 23:56:14 +00:00
four4fish	f98cd78e9e	Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` (#11145 )	2021-12-21 23:06:14 +00:00
four4fish	0c69c757d4	Rename the `DataParallelPlugin` to `DataParallelStrategy` (#11183 )	2021-12-21 22:00:24 +00:00
Aki Nitta	c3cd4d050f	Rename `SingleTPUPlugin` to `SingleTPUStrategy` (#11182 )	2021-12-21 20:09:30 +00:00
four4fish	1c5a5c3dfe	Renamed the DDP2Plugin to DDP2Strategy (#11185 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-21 19:21:00 +00:00
Carlos Mocholí	b2c3d01b3e	Fix master import conflict (#11203 )	2021-12-21 18:47:56 +00:00
Danielle Pintz	ac8dc2c2f3	Deprecate `TrainerCallbackHookMixin` (#11148 )	2021-12-21 09:47:08 -08:00
four4fish	caab69aabb	Renamed DDPShardPlugin to DDPShardStrategy (#11187 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-21 17:18:25 +00:00
Carlos Mocholí	f696326060	Remove `should_rank_save_checkpoint` property from TTP (#11070 )	2021-12-21 18:11:20 +01:00
Carlos Mocholí	3692eba807	Drop Python 3.6 support (#11117 )	2021-12-21 17:06:15 +00:00
Aki Nitta	9da78a94bd	Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` (#11190 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-21 16:36:16 +00:00
Danielle Pintz	1177389d5a	Move `TrainerCallbackHookMixin.on_save/load_checkpoint` to `Trainer` and rename for clarity (#11179 )	2021-12-21 17:30:01 +01:00
Kaushik B	2e947a88e0	Rename IPUPlugin to IPUStrategy (#11193 )	2021-12-21 15:55:41 +00:00
Kaushik B	283bdece0a	Rename DeepSpeedPlugin to DeepSpeedStrategy (#11194 )	2021-12-21 15:18:01 +00:00
Oliver Borchert	17aceafa80	Suppress Warning in `PredictionEpochLoop` (#11189 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-21 14:40:41 +00:00
Kaushik B	ba0c901395	Rename HorovodPlugin to HorovodStrategy (#11195 )	2021-12-21 14:31:41 +01:00
Rohit Gupta	93ce2d7cc9	Avoid torch amp cuda warning with bf16 on cpu (#11161 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-21 18:24:26 +05:30
four4fish	b64dea9dc3	Rename `DDPPlugin` to `DDPStrategy` (#11142 ) * Raname DDPPlugin to DDPStrategy * Change ddp_plugin to ddp_strategy * update changelog * rename occurences in docs * rename more occurrences * fix line too long * more fixes Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-21 08:55:51 +00:00
jjenniferdai	31f39c9578	Move `CheckpointConnector.fault_tolerant_auto_save_path` out of `CheckpointConnector.hpc_resume_path` (#11092 )	2021-12-21 02:24:01 +01:00
Rohit Gupta	787f41eff6	update optimizer_step example in docs (#10420 )	2021-12-21 08:19:40 +09:00
Adrian Wälchli	08e661ff72	Rename `restore_checkpoint_after_pre_dispatch` to `restore_checkpoint_after_setup` (#11166 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-20 17:16:52 +00:00
Carlos Mocholí	e8169bbd46	Fix setter usage for checkpoint io and precision in TTP (#11071 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-12-20 17:45:32 +01:00
Adrian Wälchli	f5c2881b68	3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic (#11137 )	2021-12-20 17:41:22 +01:00
Adrian Wälchli	2e47e2f4ae	Set spawn_method on initialization (#11162 )	2021-12-20 17:39:54 +01:00
four4fish	0ee78e96ef	Rename `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` (#11143 ) * Rename DDPFullyShardedPlugin to DDPFullyShardedStrategy * update fsdp_plugin to fsdp_strategy * update changelog Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-20 17:11:20 +01:00
ORippler	86a3c5e2a3	Add required states for resumed ModelCheckpoint GC (#10995 ) * Add required states for resumed ModelCheckpoint GC * Add backwards compatibility with legacy cktps Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * Add test to check if attrs are written to ckpt Note that we do not yet check for proper loading/reinstantiation of ModelCheckpooint based on the ckpt written to disk * Test if attributes are restored properly from ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix broken `test_callbacks_state_fit_ckpt_path` `ModelCheckpoint` is configured to save after every epoch, but `trainer.fit` is called with `max_steps = 1` Note there may be a better way of doing this, where `ModelCheckpoint` is called after `training_step` * Update test_restore.py * Update test_restore.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Check that all attributes are restored properly * revert changes, use fix on master * Convert to proper unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor `test_mode_checkpoint_saveload_ckpt` * First save, then load ckpt. * Instantiate ModelCheckpoint twice. Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-20 17:05:15 +01:00
Danielle Pintz	b1baf460d9	Include hook's object name when profiling (#11026 )	2021-12-20 15:18:24 +01:00
Adrian Wälchli	29eb9cccf2	Rename the `TrainingTypePlugin` base to `Strategy` (#11120 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>	2021-12-20 12:50:11 +00:00
guyang3532	cc4a978bf6	Safely disable profiler (#11167 )	2021-12-20 11:51:46 +00:00
Carlos Mocholí	7ed3dbf191	Fix evaluation logging on epoch end with multiple dataloaders (#11132 )	2021-12-19 15:51:01 +01:00
Danielle Pintz	f95976d602	rename _call_ttp_hook to _call_strategy_hook (#11150 )	2021-12-18 17:53:03 -08:00
Rohit Gupta	3461af0ddb	Add support for returning callback from `LightningModule.configure_callbacks` (#11060 )	2021-12-18 10:46:35 +00:00
Rafał Jankowski	3cc69f992b	Fixed NeptuneLogger when using DDP (#11030 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-12-18 01:40:13 +00:00
Carlos Mocholí	62f1e82e03	Fix CVE-2020-1747 and CVE-2020-14343 (#11099 )	2021-12-17 20:27:15 +00:00
Carlos Mocholí	8508cce37d	Mark all result classes as protected (#11130 )	2021-12-17 19:35:17 +00:00
Rohit Gupta	860959fb3f	Enable logging hparams only if there are any (#11105 )	2021-12-17 19:40:56 +01:00
Carlos Mocholí	dbb7f56b35	Deprecate `Trainer.verbose_evaluate` (#10931 )	2021-12-17 19:26:32 +01:00
Carlos Mocholí	75d96d9897	Reset the current progress tracking state during double evaluation (#11119 )	2021-12-17 19:20:11 +01:00
Adrian Wälchli	978f5e6ad6	Fix AttributeError when using CombinedLoader in prediction (#11111 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-12-17 18:02:25 +00:00
quancs	179b4dd415	remove redundant methods in RichProgressBar (#11100 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-17 17:40:31 +00:00
Carlos Mocholí	7e10f6d41f	Save the loop progress state by default (#10784 )	2021-12-17 16:00:27 +00:00
Carlos Mocholí	fa6d17c96f	Fix typing for utilities.warnings (#11115 )	2021-12-17 15:07:27 +01:00
Adrian Wälchli	6582249a0c	Fix signal teardown outside main thread (#11124 )	2021-12-17 14:12:02 +01:00
Carlos Mocholí	5956a0716b	Track the evaluation loop outputs in the loop (#10928 )	2021-12-17 14:00:47 +01:00
Adrian Wälchli	210ff845c1	Mark `Trainer.run_stage` as protected (#11000 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-17 13:46:03 +01:00
Sean Naren	c66cd12445	Remove partitioning of model in ZeRO 3 (#10655 )	2021-12-17 12:36:53 +00:00
Carlos Mocholí	4415677994	Add typing for `trainer.logger` (#11114 )	2021-12-17 13:34:18 +01:00
Carlos Mocholí	5932f52b2f	Avoid the deprecated `onnx.export(example_outputs=...)` in torch 1.10 (#11116 )	2021-12-17 10:11:11 +01:00
Adrian Wälchli	e19d93f69e	Initialize ModelCheckpoint state as early as possible (#11108 )	2021-12-17 00:18:29 +01:00
Adrian Wälchli	262aefc8df	Remove obsolete `pre_dispatch` in `DDPSpawnShardedPlugin` (#10988 )	2021-12-16 21:43:15 +01:00
Adrian Wälchli	2b0075a47e	Teardown sync-batchnorm after training (#11078 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-16 18:58:44 +00:00
Carlos Mocholí	46d6fbf11b	Add `Loop.replace` (#10324 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-12-16 17:41:38 +00:00
Adrian Wälchli	c335a7891d	Remove redundant special case for disabling the progress bar on TPU (#11061 )	2021-12-16 18:02:50 +01:00
Carlos Mocholí	f37bd4677d	Update mypy (#11096 )	2021-12-16 17:53:12 +01:00

... 2 3 4 5 6 ...

4112 Commits