lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	26977043bf	Add separate CI job for slow tests (#10830 )	2021-12-01 19:58:18 +00:00
Carlos Mocholí	a7aed2af7a	[CLI] Add support for `ReduceLROnPlateau` (#10860 )	2021-12-01 15:41:22 +00:00
Rafał Jankowski	c6478414ee	Fixed uploading best model checkpoint in NeptuneLogger (#10369 )	2021-12-01 13:58:54 +00:00
Aka.Fido	72cc8b7ca9	Disable validation completely when `overfit_batches>0` (#9709 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-12-01 13:57:57 +00:00
Adrian Wälchli	e6cc99ef90	Fix selection of standalone tests (#10857 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-12-01 09:48:37 +01:00
Kaushik B	ec0fb2fd95	Raise exception if rich is less than 10.2.2 (#10839 )	2021-12-01 06:14:19 +00:00
Andres Algaba	1a26af1519	Add job_name as a staticmethod in SLURMEnvironment class (#10698 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-12-01 00:01:44 +00:00
Mauricio Villegas	f3b0a06e90	Fix `SignalConnector._has_already_handler` check for callable type (#10483 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-30 22:47:52 +00:00
Adrian Wälchli	25473acddb	Restore signals on teardown (#10611 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-30 22:07:14 +00:00
Rohit Gupta	1437be5e98	Disable batch_size extraction for torchmetric instances (#10815 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-30 20:47:05 +00:00
Carlos Mocholí	0061619e0a	Improve typing for loops (#10780 )	2021-11-30 20:28:55 +00:00
Abhinav Arora	f63222d966	Remove references to torchtext.legacy from PyTorch Lightning (#10724 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-11-30 19:32:07 +00:00
Carlos Mocholí	8e1b9b306c	Skip hanging spawn tests (#10838 ) * Skip hanging spawn tests * Docstring fix * Add back to TPU spawn	2021-11-30 18:36:12 +00:00
Carlos Mocholí	38ed26ec5a	Do not require omegaconf to run tests (#10832 )	2021-11-30 14:48:03 +00:00
Adrian Wälchli	a81accb2ad	Update LiteOptimizer signature after optimizer changes in TrainingTypePlugin (#10708 )	2021-11-30 15:16:59 +01:00
Carlos Mocholí	1b43e43e9f	Minor changes in preparation for saving the loops state (#10783 )	2021-11-30 19:37:04 +05:30
Carlos Mocholí	4710734f14	Improve `@RunIf` docs (#10828 )	2021-11-30 14:21:38 +01:00
Andres Algaba	e0474f8f0f	Add test for `job_id` (#10774 )	2021-11-30 11:53:55 +01:00
four4fish	1d2878523a	2/n Move Precision Plugin into strategy - move optimizer related logics (#10596 ) Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-11-30 08:31:23 +00:00
four4fish	8bf7f9cce7	1/n Move Accelerator into strategy - move batch_to_device to strategy (#10649 ) * 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy * add changelog * add model is not none check * Apply suggestions from code review Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md * Update test_datamodules.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_hooks.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update dp.py Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-29 12:11:21 -08:00
Rohit Gupta	753cc4dfad	Fix default logging levels for train step specific hooks (#10756 )	2021-11-29 19:51:17 +00:00
Carlos Mocholí	d3b7492bd0	[CLI] Add support for `--key.help=class` (#10767 )	2021-11-29 14:12:53 +00:00
Adrian Wälchli	97e52619ea	Fix typing in `pl.overrides.data_parallel` (#10796 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-29 10:58:23 +01:00
Carlos Mocholí	724a92b065	Mark outputs as protected in the evaluation loops (#10781 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-11-28 20:09:30 +00:00
Adrian Wälchli	c752060712	Consolidate state when retrieving sharded state dict in Lite (#10746 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-11-27 04:54:45 +00:00
thomas chaton	e94aff1c5b	Fault Tolerant: Add support for fault tolerant dataloader validator (#10465 )	2021-11-26 19:33:47 +00:00
Carlos Mocholí	31bb6e69ca	Avoid optional instances in Loops (#10735 ) * Avoid optional instances in Loops * More cleanup	2021-11-26 18:00:18 +00:00
Carlos Mocholí	152eb57def	Rename special to standalone (#10779 )	2021-11-26 17:13:14 +00:00
thomas chaton	6fe6e9e414	Delete TensorBoardLogger experiment before spawning the processes. (#10777 )	2021-11-26 17:07:57 +00:00
thomas chaton	412d507a73	Fault Tolerant: move signal to SIGTERM (#10605 )	2021-11-26 13:37:27 +00:00
thomas chaton	3d6262b7a9	Fault Tolerant Manual: Add support for DDP (#10638 )	2021-11-25 18:31:53 +01:00
Kaushik B	e0b4bb2ea3	Deprecate `DeviceType` in favor of `_AcceleratorType` (#10503 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-25 16:41:03 +01:00
Carlos Mocholí	f8b2d5b128	Improve error message on `TypeError` during `DataLoader` reconstruction (#10719 )	2021-11-24 21:51:11 +00:00
thomas chaton	0066ff0129	Fault Tolerant Manual: Enable the feature (#10707 )	2021-11-24 17:36:08 +00:00
Adrian Wälchli	30ec4815cb	Support re-instantiation for custom DataLoader in Lightning (#10680 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-11-24 15:58:51 +01:00
thomas chaton	e51a8ee7a3	Fault Tolerant Manual: utilities cleanup (#10703 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-24 15:01:55 +01:00
Rohit Gupta	f36b395c4e	Update `LightningDataModule` docs (#10678 )	2021-11-24 11:31:03 +00:00
thomas chaton	b28ab34ff5	Fault Tolerant Manual: Add loading to reload the states (#10699 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-23 17:18:36 +00:00
Adrian Wälchli	dca1776870	LiteDataLoader wrapper improvements (#10297 )	2021-11-23 16:35:07 +01:00
thomas chaton	7cf6374bd0	Fault Tolerant Manual: Add support for collecting states across processes (#10639 )	2021-11-23 14:27:33 +00:00
thomas chaton	1702036c14	Fault Tolerant Manual: Add stateful dataloader iter (#10674 )	2021-11-23 12:30:50 +00:00
Kaushik B	48cf1adfd3	Move Colab setup to ProgressBar (#10542 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-23 06:16:31 +00:00
thomas chaton	2036dfb5df	Fault Tolerant Manual: Add _rotate_worker_indices utility (#10647 )	2021-11-22 19:52:04 +00:00
Rohit Gupta	823bfa6f8a	Update `LightningModule` docs (#10637 )	2021-11-23 01:02:04 +05:30
thomas chaton	6acfef680f	Fault Tolerant Manual: Add is_obj_stateful utility (#10646 )	2021-11-22 18:48:32 +00:00
Andres Algaba	6fc7c54c3a	refactor slurm_job_id (#10622 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-11-22 17:41:08 +00:00
Rohit Gupta	d431ce14a1	Raise an error if batch_size cannot be inferred from current batch (#10541 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-22 16:55:19 +00:00
Danielle Pintz	6810c40fc9	Small improvements to `_init_debugging_flags` (#10620 )	2021-11-22 11:38:09 -05:00
Carlos Mocholí	a6dedcf492	Fix `move_metrics_to_cpu` with evaluation (#10631 )	2021-11-22 15:58:21 +00:00
thomas chaton	991cd895c6	1/n Add `FaultTolerantMode` (#10645 )	2021-11-22 14:58:23 +00:00
puhuk	af0bb96f0f	Remove the "_precision" suffix from some precision plugin files (#10052 )	2021-11-19 17:37:39 +00:00
Mauricio Villegas	5d748e560b	LightningCLI changes for jsonargparse>=4.0.0 (#10426 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-11-19 17:03:14 +00:00
Rohit Gupta	ec27313be2	Fix batch size extraction when set by the user in `LightningModule.log` (#10408 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-19 16:48:26 +00:00
Jaime Ferrando Huertas	721b8413a0	Added boring model as a ipynb so it can be updated (#10521 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-19 16:32:30 +00:00
Biho-Kim	e83e8ae305	Respect the passed dtype with `self.log` (#10076 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-11-19 15:16:33 +00:00
Carlos Mocholí	3d2d0f2536	MANIFEST.in and setup.py clean-up (#7614 )	2021-11-19 15:38:42 +01:00
Adrian Wälchli	8950354fe4	Extract dataloader utilities from `TrainerDataLoadingMixin` (#10145 )	2021-11-19 12:45:35 +00:00
Adrian Wälchli	085e82f454	Introduce `ClusterEnvironment.detect()` (#10564 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-19 12:24:10 +00:00
Adrian Wälchli	c09c9c7607	Remove redundant fit call from accelerator connector test (#10626 )	2021-11-19 12:19:52 +05:30
Kaushik B	137b62d80d	Add `refresh_rate` to RichProgressBar (#10497 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-11-19 05:59:57 +00:00
thomas chaton	7d3ad5b76e	Don't register signal in thread (#10610 )	2021-11-19 04:13:35 +01:00
Carlos Mocholí	5788789f01	Move benchmarks into the test directory (#10614 )	2021-11-19 03:07:33 +01:00
Carlos Mocholí	0de8ab4f2e	Fix failing master due to an interction between PRs (#10627 )	2021-11-19 02:04:53 +00:00
Carlos Mocholí	35f6cbe09f	Use `update_wrapper` in test_hooks.py (#10578 )	2021-11-19 01:52:55 +01:00
four4fish	700521c7d3	1/n Move precision plugin into strategy - update reference (#10570 ) * 1/n move precision plugin into strategy - update reference * update precision plugin reference in tpu_spawn * add missing reference in error message * add back removed license line * update references in tests * update reference in trainer * update return annotation for precision_plugin property on TTP * simplify access to precision plugin reference in sharded plug * add changelog * remove precision property from ttp and add deprecation message * fix make doc and update precision reference * simplify a reference to precision accidentally overridden Adrian's change, now add it back * Update CHANGELOG.md add Adrian's change back * Update accelerator precision Add Adrian's change back * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add none check for precision plugin just to be safe * Update ipu.py * update precision_plugin param deprecation message * Update accelerator.py * Remove deprecated warning Tests will fail after 9940 Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-19 00:39:01 +00:00
Adrian Wälchli	0f6d89422b	Control automatic resubmission on SLURM (#10601 )	2021-11-18 17:48:53 +00:00
shabie	6b728713bb	log metrics for correct dataloader only (#10522 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-18 18:29:13 +01:00
Adrian Wälchli	1ff35ed0f5	Improve code quality in `AcceleratorConnector._configure_slurm_ddp` (#10102 )	2021-11-17 23:10:47 +00:00
Carlos Mocholí	0fa07da987	Fail the test when a `DeprecationWarning` is raised (#9940 )	2021-11-17 23:41:50 +01:00
Carlos Mocholí	c15b84dae7	Simplify hanging queue test (#10591 )	2021-11-17 22:29:48 +00:00
Carlos Mocholí	ba036fdeea	Support special test parametrizations (#10569 )	2021-11-17 15:46:14 +00:00
Carlos Mocholí	3b2e164cab	Fix `caplog` with `logger.propagate=False` (#10577 )	2021-11-17 16:25:55 +01:00
Adrian Wälchli	d50e1696f9	Fix propagation of device and dtype properties in Lite modules (#10559 )	2021-11-16 17:26:46 +00:00
Carlos Mocholí	af4af3d73a	Mock GPU accelerator connector tests (#10554 )	2021-11-16 16:13:40 +00:00
Sean Naren	e98ace3adc	[DeepSpeed] Do not fail if batch size could not be inferred for logging (#10438 )	2021-11-16 11:42:25 +00:00
Rohit Gupta	de7ef41fea	remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#10481 )	2021-11-16 06:47:43 +00:00
Carlos Mocholí	6dfcb6afc5	Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests (#10550 )	2021-11-16 10:06:47 +05:30
Rohit Gupta	60850ef510	fix overfit_batch sampler replacement logic (#10486 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-11-15 22:31:45 +00:00
Carlos Mocholí	dcafc95f2b	Avoid deprecated `progress_bar_refresh_rate` usage (#10520 ) Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>	2021-11-15 22:04:48 +01:00
thomas chaton	1de3539eac	Resolve instantiation problem with init_meta_context (#10493 )	2021-11-15 19:13:01 +00:00
Kaushik B	ae71284627	Remove deprecated `disable_validation` property from Trainer (#10450 )	2021-11-15 18:42:00 +00:00
Kaushik B	01cf7a2ac5	Deprecate `DistributedType` in favor of `StrategyType` (#10505 )	2021-11-15 17:10:08 +00:00
Shivam Mehta	794c4b08c0	Remove deprecated `is_overridden(model=...)` (#10507 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-15 12:56:30 +00:00
puhuk	8b0cb47cc0	Remove deprecated `hpc_load` in `CheckpointConnector` (#10525 ) Co-authored-by: Aki Nitta <nitta@akihironitta.com>	2021-11-15 11:54:47 +00:00
thomas chaton	ffb40060c0	shutdown workers on failure (#10463 )	2021-11-15 10:03:46 +00:00
Carlos Mocholí	7a9a08c5d3	Drop torch 1.6 testing (#10390 ) * Drop torch 1.6 support * Drop 1.6 support * Update CHANGELOG * Fixes * Split change * Undo change * 1.7 -> 1.7.1 https://github.com/pytorch/pytorch/issues/47354 * Force trigger nightly * Update .github/workflows/events-nightly.yml Co-authored-by: Aki Nitta <nitta@akihironitta.com> * Revert 1.7.1 change - try wildcard * Update adjust versions and test it * Undo test changes * Revert "Undo test changes" This reverts commit `3a6acadd11`. * Update CHANGELOG.md Co-authored-by: Aki Nitta <nitta@akihironitta.com>	2021-11-13 20:35:03 +00:00
Rohit Gupta	a8c2725ff8	remove deprecated signature for `transfer_batch_to_device` (#10480 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-11-13 19:32:30 +00:00
Kaushik B	fabb364402	Remove deprecated `mode` argument from ModelSummary (#10449 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-12 19:32:43 +00:00
Carlos Mocholí	847e24011a	Squeeze the early stopping monitor (#10461 )	2021-11-12 18:03:47 +00:00
Rohit Gupta	fa0ed17f8a	remove deprecated train_loop (#10482 ) * remove deprecated train_loop * chlog	2021-11-12 12:42:25 +00:00
Raahul Singh	09cf167237	Change attributes of `RichProgressBarTheme` dataclass (#10454 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-11-11 19:53:40 +00:00
Carlos Mocholí	5ba5b72473	Update tests to avoid the deprecated `weights_summary` (#10446 )	2021-11-11 18:15:18 +01:00
Kaushik B	d577f461a4	Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` (#10451 )	2021-11-10 07:35:48 -08:00
a-gardner1	ce149f6451	Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` (#9702 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-10 04:42:27 +00:00
Carlos Mocholí	d515bcac96	Remove deprecated profiler import (#10443 )	2021-11-09 23:13:02 +01:00
thomas chaton	8d810d6144	Enable distributed training with CombinedDataLoader and max_size_cycle (#10374 ) * solve combinedloader * update * update changelog * update on comments * resolve iterable dataset support * update test description * update * update on comments * update * Accelerator auto * Address review * Refactor Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-09 20:06:10 +00:00
Carlos Mocholí	c413b69240	Remove deprecated `task_idx` (#10441 )	2021-11-09 18:54:38 +00:00
Carlos Mocholí	ebab4be3e4	Remove deprecated `DeviceDtypeModuleMixin` import (#10442 )	2021-11-09 18:35:53 +00:00
Ross Johnstone	c2f25d42ab	Make `monitor` required arg of EarlyStopping callback (#10328 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-09 18:08:03 +00:00
Carlos Mocholí	069ec1005a	Do not autodetach extras (#10424 ) * Do not autodetach extras * Update CHANGELOG * Use foo	2021-11-09 16:07:16 +00:00
thomas chaton	7fb277f260	Resolve workers being forcelly deleted with `persistent_workers=True` (#10434 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-09 14:58:31 +00:00
Carlos Mocholí	edbf27430d	Remove deprecated `self.log` arguments (#10423 )	2021-11-09 15:49:55 +01:00
Adrian Wälchli	aaa6aa75e9	Fix converting only float type tensors in Lite (#10429 ) * fix * less code * add test case * add test cases * update input * add test cases * add type hint * add changelog note Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-11-09 15:21:00 +01:00
Kaushik B	5eeca87e98	Fix deadlocks for distributed training for RichProgressBar (#10428 )	2021-11-09 18:30:37 +05:30
Rohit Gupta	21eafafcb0	disable step logging in epoch hooks (#10409 ) * disable step logging in epoch hooks * chlog * Apply suggestions from code review * chlog	2021-11-09 16:53:27 +05:30
puhuk	f9b9cdb0d1	Remove deprecated accelerator pass through functions in Accelerator (#10403 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-08 17:36:37 +00:00
Adrian Wälchli	a270a79ed9	Rename "master" methods to "main" in ClusterEnvironment plugins (#10103 ) * rename occurrences of master port, master address, maser node, master process * rename properties * add property decorators * occurrences in docs * update changelog * update changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lost method * create deprecation * add changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo (but it was already there!!!) * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * add todo * update more occurences * add types * add missing import Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-11-08 12:32:58 +00:00
Carlos Mocholí	613aa09514	Revert part of #10279 (#10376 )	2021-11-08 11:28:58 +00:00
Espen Haugsdal	89e1360e75	Fix pickling error with CSVLogger (#10388 ) * Don't store csv.Dictwriter in ExperimentWriter * Add test for pickle after .save() * Add entry in changelog	2021-11-08 10:36:35 +00:00
puhuk	c58f84c176	Remove deprecated master_params attributes in PrecisionPlugin (#10372 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-11-08 02:42:03 +00:00
Adrian Wälchli	45f6a3b175	Fix DataLoader inspection and re-instantiation in Lite (#10334 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-05 17:31:45 +00:00
Connor Anderson	1c28f361d4	Remove `every_n_val_epochs` from ModelCheckpoint (#10366 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-05 15:19:33 +00:00
Saurav Maheshkar	a9bd4fbd96	Remove deprecated property `configure_slurm_dpp` from accelerator connector (#10370 ) * Remove deprecated configure_slurm_ddp * Update CHANGELOG * Remove deprecated tests from test suite	2021-11-05 14:11:30 +00:00
puhuk	9c4112ce1c	Remove deprecated sync_batchnorm and num_nodes attributes in DDP plugins (#10357 ) * Remove deprecated sync_batchnorm and num_nodes attributes in DDPPlugin Part of #10312 test_v1_6_0_ddp_num_nodes() test_v1_6_0_ddp_sync_batchnorm() * Remove deprecated sync_batchnorm and num_nodes attributes in DDPPlugin Part of #10312 test_v1_6_0_ddp_num_nodes() test_v1_6_0_ddp_sync_batchnorm() * remove deprecation warnings * apply removal to spawn plugin * update changelog * remove num_nodes in deepspeed * remove unused imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-05 10:13:12 +00:00
four4fish	973305c6a5	Add more trainer config tests (#10319 ) * Add more trainer config tests * Add more trainer config and ttp register tests * Add more trainer config and ttp register tests	2021-11-05 10:42:58 +01:00
Saurav Maheshkar	6b5e185d07	Remove deprecated property `is_slurm_managing_tasks` from accelerator connector (#10353 ) * Remove deprecated property _slurm_managing_tasks from accelerator connector * Update CHANGELOG * Update Changelog * Removed is_slurm_managing_tasks from AcceleratorConnector * resolve merge conflict * add back accidentally removed lines * remove test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-05 09:38:53 +00:00
Alexandre Mayerowitz	b3c0f121ca	Remove deprecated datamodule lifecycle properties (#10350 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-05 05:03:57 +00:00
Adrian Wälchli	3664659094	Remove deprecated method `ClusterEnvironment.creates_children` (#10339 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-04 17:11:32 +00:00
Peter Dudfield	ce3e63262a	Fix failure when `DataLoader(batch_size=None)` is passed (#10345 ) * add test, + add change to data loading batch sample method * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor and CHANGELOG Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-04 12:46:57 +01:00
puhuk	412f0a4d24	Remove deprecated dataloader arguments in Trainer methods (#10325 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-04 11:03:39 +01:00
Connor Anderson	6f00ba21c2	Remove deprecated `loaded_optimizer_states_dict` property (#10346 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-04 01:51:46 +01:00
Carlos Mocholí	ba23d91320	Update recommendation on `dataloader_idx` (#10318 )	2021-11-04 01:39:55 +01:00
Danielle Pintz	c5d011c3cf	Remove `TrainerModelHooksMixin` (#10322 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-03 20:26:59 +00:00
Carlos Mocholí	93caa7cda9	Fix `apply_to_collection(defaultdict)` (#10316 )	2021-11-03 11:18:10 +00:00
Ning	f6ed0bd8ca	introduce has_len_all_ranks() to check the length of dataloader across ranks (#9827 ) * introduce , udpate tests * update CHANGELOG.md * change staticmethod and hook attribute naming * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * remove non-essential comment * fix merge error and comment format * try to fix test_tpu.py failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update on comments * chlog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chlog * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try fix * Revert back TPUSpawn changes * Update test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-11-02 13:22:58 -04:00
Kaushik B	34fcb87a2b	Add `leave` argument to RichProgressBar (#10301 ) * Add display_every_n_epochs argument to RichProgressBar * Add tests * Update test * Update test * Update changelog * use leave argument instead * Update pytorch_lightning/callbacks/progress/rich_progress.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-02 13:20:52 -04:00
Adrian Wälchli	373c32e34b	Fix yielding from iterator in LiteDataLoader (#10304 ) * fix yielding form iterator * update description * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused code Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-02 11:40:35 +01:00
Adrian Wälchli	3cd65b592b	Lightning Lite Examples (#9987 ) Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Pietro Lesci <61748653+pietrolesci@users.noreply.github.com>	2021-11-02 08:04:29 +00:00
Rohit Gupta	e4ee6df196	Add warning if multiple batch_sizes are found from ambiguous batch (#10247 )	2021-11-01 19:50:30 +00:00
victorjoos	cc0e9f96a8	Add support for empty `gpus` list to run on CPU (#10246 ) Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-11-01 18:37:38 +00:00
thomas chaton	facaff94b8	Add custom dataloader support with Lite (#10279 )	2021-11-01 18:33:13 +00:00
Kaushik B	c52d7ba73d	Add `configure_columns` method to RichProgressBar (#10288 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-11-01 17:22:53 +00:00
Rohit Gupta	6609b2e46f	enable `on_load_checkpoint` for `datamodule` for all `trainer_fn` (#10238 )	2021-11-01 14:20:46 +00:00
Kaushik B	45c45dc7b0	Deprecate `ProgressBar` and rename it to `TQDMProgressBar` (#10134 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-01 11:42:21 +00:00
Kaushik B	2ee6d9fbc7	Fix `distrib_type` not being set when Plugin instances being passed to Trainer (#10251 )	2021-11-01 17:11:57 +05:30
Carlos Mocholí	2b24be2e45	Simplify `LightningOptimizer` (#10224 )	2021-10-30 15:56:15 +00:00
Kaushik B	e0f7dbdd1c	Add support for `devices='auto'` (#10264 )	2021-10-30 15:05:51 +00:00
Carlos Mocholí	9237106451	Clip before step (#10248 )	2021-10-30 11:27:49 +01:00
Adrian Wälchli	9d136a9fc5	Lightning Lite core and tests (#10175 )	2021-10-29 21:46:39 +00:00
Kaushik B	cedaebfcbb	Add `auto_device_count` method to `Accelerators` (#10222 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-29 22:31:32 +02:00
Gili Tzabari	a967b6eba0	del iterator on_run_end() (#9915 )	2021-10-29 16:29:44 +00:00
Carlos Mocholí	e4eb61d812	Raise exception for `strategy=ddp_cpu\|tpu_spawn` (#10185 )	2021-10-29 16:15:24 +00:00
Carlos Mocholí	81d15c5986	Implement double optimizer closure for hook structure consistency (#10167 )	2021-10-29 13:03:04 +00:00
thomas chaton	bd77f65463	Resolve batch_size in ResultCollection not resetted to 1 on epoch end (#10242 )	2021-10-29 13:55:11 +01:00
thomas chaton	843bf26297	Fix `log(sync_dist=True, on_epoch=True, on_step=True)` not reducing on step (#10227 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-29 12:08:32 +00:00
Carlos Mocholí	4bc73b2b76	Avoid deprecated usage in accelerator connector tests (#10184 )	2021-10-29 12:36:21 +01:00
Ning	dbfadedfe7	Revert "Add support for `len(datamodule)` (#9895 )" (#10072 ) This reverts commit `6429de8944`.	2021-10-29 13:33:51 +02:00
Rohit Gupta	6a9adf26f7	Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10240 )	2021-10-29 10:36:02 +00:00
thomas chaton	5f4ffdee41	cleanup (#10081 )	2021-10-29 08:40:43 +00:00
Adrian Wälchli	3f9dfe4949	Fix iterating over a DummyLogger when `fast_dev_run > 0` (#10232 )	2021-10-29 07:22:59 +00:00
Kaushik B	762af9505b	Add missing test for testing custom registered training plugin (#10225 )	2021-10-29 04:06:06 +00:00
thomas chaton	255e3edc98	resolve failing test (#10191 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-28 15:27:03 +00:00
Carlos Mocholí	03f01fb5ec	Fix gradient norm tracking and gradient clipping (#9287 ) * WIP * Progress * Undo test change * Fix plugin closure execution order * Update CHANGELOG * Fix manual optimization on AMP and skipping backward * Fix for deepspeed * Typo * Hook test for manual closure * Add skipping test with AMP * You are hideous, apex * Add deepspeed test * Update CHANGELOG * Fix for broken master * Add RunIf * FIXMEs * Rename * Fix grad norm * add a simple test * update test * update test * update test * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sea of changes * Undo change * Introduce TPUPrecisionPlugin * Undo changes * Undo changes * Resolve FIXME * Undo change * Undo change * Undo change * Fix FIXMEs * Fix FIXME * Correct value * Bad merge * Fix circular imports * WIP * Fixing clipping * Fixes * Bad merge * Move optimizer step and clipping into the `PrecisionPlugin` * Fix AMP * Update CHANGELOG * Fix tests * Underscore * Progress * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove pre_optimizer_step * Missed one * Progress * Progress * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FIXMEs * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix test * DeepSpeed warning. mypy * Rename * Finish tests * Update CHANGELOG * Dumb fixes * accelerator=auto * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update on comments * Use ClassifModule Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-28 15:23:27 +00:00
Carlos Mocholí	5262b63dff	Pass the scaler as an input to `NativeMixedPrecisionPlugin` (#10055 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-28 14:13:53 +00:00
Low Weng Fei	83d74bb385	Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` (#10099 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: tchaton <thomas@grid.ai>	2021-10-28 12:57:41 +00:00
Rohit Gupta	9af1dd7443	Deprecate `lr_sch_names` from `LearningRateMonitor` (#10066 )	2021-10-28 12:57:04 +00:00
Rohit Gupta	85eb17cde5	initialize poptorch_models based on trainer_fn (#10149 )	2021-10-28 11:59:52 +00:00
Carlos Mocholí	dbe1662dc3	Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10157 )	2021-10-27 13:38:39 +01:00
Kaushik B	c33df2639f	Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn (#10151 )	2021-10-27 01:23:01 +05:30
Carlos Mocholí	48b6292cf0	Move optimizer step and clipping into the `PrecisionPlugin` (#10143 )	2021-10-26 17:26:26 +02:00
Carlos Mocholí	a0e45dc071	Some minor CI cleanup (#10088 )	2021-10-26 13:58:20 +02:00
twsl	971281d27d	Make sure file and folder exists in Profiler (#10073 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-26 11:13:31 +00:00
Adrian Wälchli	871a96701a	Rename `master_params` to `main_params` (#10105 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-26 11:17:32 +02:00
Rohit Gupta	34d5980df6	Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016 )	2021-10-25 23:12:08 -07:00
Danielle Pintz	13d6d7bad1	Remove `optimizer_connector.py` (#10120 )	2021-10-26 00:52:43 +00:00
Adrian Wälchli	21a5867dad	Rename `ClusterEnvironment.creates_processes` (#10106 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-25 23:15:41 +00:00
Rajat Goel	47e7a2860f	Fix Enums parsing in generated hparms yaml (#9170 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 21:23:20 +00:00
Eric Wiener	0e20119d24	Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-10-25 20:21:33 +00:00
Rohit Gupta	d9dfb2e920	fix tests (#10138 )	2021-10-25 19:37:47 +00:00
Danielle Pintz	1f7bd6650c	Mark accelerator connector as protected (#10032 )	2021-10-25 19:24:54 +00:00
jjenniferdai	6d79184ec5	Unify checkpoint load paths [redo #9693 ] (#10061 )	2021-10-25 19:05:31 +00:00
Adrian Wälchli	76081fb846	Mark SLURM detection methods in `AcceleratorConnector` as protected (#10101 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-10-25 17:52:15 +00:00
Carlos Mocholí	2ee3127661	Use `torch.autocast` (#10053 )	2021-10-25 17:33:52 +00:00
Carlos Mocholí	b376799430	Minor fixes related to clipping (#10130 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 16:40:22 +00:00
manipopopo	cfb2d87765	Disable quantization aware training observers (#8540 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-10-25 15:46:09 +00:00
Adrian Wälchli	7eb2edf421	rename set_random_master_port (#10104 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-25 12:09:05 +00:00
Danielle Pintz	e94dcf6936	Mark `trainer.data_connector` as protected (#10031 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-25 12:29:09 +01:00
Carlos Mocholí	f95ba20012	Do not use the base version by default in `_compare_version` (#10051 )	2021-10-25 16:41:32 +05:30
thomas chaton	ed9802643c	[CI] Comment flaky tests (#10084 )	2021-10-25 10:31:06 +02:00
Kaushik B	c3614f1c07	Fix: skip importing DistributedOptimizer for Windows (#10071 )	2021-10-21 21:01:56 +00:00
thomas chaton	454e93bace	Add support for init_meta_context, materialize_module (#9920 )	2021-10-21 15:48:31 +01:00
jjenniferdai	2d9db211b5	Revert "Support serialized checkpoint loading (#9605 )" (#10057 ) This reverts commit `f0e6f1b58a`.	2021-10-21 02:51:22 +02:00
Kaushik B	aa1540410f	Add XLACheckpointIO (#9972 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-21 02:39:16 +05:30
Rohit Gupta	1599c77d16	Fix `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler (#10044 )	2021-10-20 22:13:00 +05:30
Carlos Mocholí	6aeebf1bd3	Remove unnecessary dependency available checks (#10050 )	2021-10-20 16:21:37 +00:00
Alessio Bonfiglio	2a2fa5a56a	Group all the logged gradients under the same sub-folder (#7756 )	2021-10-20 15:48:36 +00:00
Kaushik B	56bc55db71	Update strategy flag in docs (#10000 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-20 21:02:53 +05:30
kingyiusuen	2ed92ecabb	Rerun flaky profiler tests on failure (#10035 )	2021-10-20 18:57:04 +05:30
Carlos Mocholí	f0b3e0f4de	Default to `precision=bf16` on CPU when `precision=16` is passed (#10033 )	2021-10-20 13:25:13 +00:00
Adrian Wälchli	2c16f1d6b9	remove dataloader patching on the LightningModule (#9764 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-20 15:23:20 +02:00
jjenniferdai	f0e6f1b58a	Support serialized checkpoint loading (#9605 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-20 09:38:35 +01:00
Carlos Mocholí	53c62f63e8	Constrain IPU precision choices (#10030 )	2021-10-20 00:52:01 +00:00
Carlos Mocholí	ad8d6c83da	[CLI] Shorthand notation to instantiate datamodules (#10011 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-20 00:49:48 +00:00
Carlos Mocholí	e44921ee21	Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start (#9791 )	2021-10-20 01:56:37 +02:00
Carlos Mocholí	d45897d522	Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` (#10026 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-19 21:09:37 +00:00
Ning	0b68f2abf8	Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop (#9671 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-10-19 21:45:52 +02:00
Carlos Mocholí	e8beceb631	Add `TPUPrecisionPlugin` (#10020 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-19 17:48:57 +00:00
thomas chaton	1759403c8d	Add check for callable with datamodule len (#10003 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-19 14:51:08 +00:00
Rohit Gupta	0aa220b46b	Remove deprecated `distributed_backend` from `Trainer` (#10017 ) * rm distributed_backend from Trainer * unused * chlog * internal distributed_backend * Docstring Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-19 13:54:37 +00:00
Danielle Pintz	203737bfce	Don't raise DeprecationWarning for `LoggerConnector.gpus_metrics` (#9959 )	2021-10-18 22:51:09 +00:00
Adrian Wälchli	a99b7440b5	Add unit tests for `pl.utilities.grads` (#9765 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-18 18:58:51 +05:30
Rohit Gupta	4dc32ad7db	Fix logic to check for spawn in worker_check (#9902 ) * fix * update tests * chlog * skip windows	2021-10-18 13:02:46 +00:00
Carlos Mocholí	3f355d0eb7	Remove manual tracking of optimizer steps (#9957 )	2021-10-18 12:43:06 +00:00
Carlos Mocholí	0684e5295f	Remove deprecated `DataModule.dims` usage in tests (#9948 )	2021-10-18 17:35:41 +05:30
Carlos Mocholí	c69a79c86f	Fix `self.log(on_epoch=True)` on_batch_start (#9780 )	2021-10-18 14:02:16 +02:00
Elad Segal	8c76cf5ae1	reset val dataloader for binsearch (#9975 )	2021-10-18 12:54:26 +02:00
Carlos Mocholí	01b304ec57	Update accelerator connector messages after the addition of strategy (#9937 )	2021-10-18 01:10:48 +00:00
Carlos Mocholí	788f6864d9	Fix `LightningOptimizer` step and toggling logic (#9958 )	2021-10-18 00:23:51 +00:00
ronif	7b4df7bf91	Fix issue with no-init dataclass fields in move_to_device (#9963 ) Co-authored-by: ronif <ronif@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-17 07:10:47 +00:00
Carlos Mocholí	e5dfdf34f9	Avoid deprecation warning after #9901 (#9951 )	2021-10-16 17:36:25 +01:00
Kaushik B	5e8829b97d	(1/n) tests: Use strategy flag instead of accelerator for training strategies (#9931 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-16 20:40:25 +05:30
Carlos Mocholí	e973bcb76a	Use non-deprecated options in tests (#9949 )	2021-10-15 16:58:07 -07:00
Carlos Mocholí	db4e770004	Validate the precision input earlier (#9763 )	2021-10-15 17:30:00 +00:00
kingyiusuen	6429de8944	Add support for `len(datamodule)` (#9895 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-15 14:19:50 +02:00
Danielle Pintz	16213b1635	Deprecate `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback (#9921 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-14 22:45:44 +02:00
Oliver Borchert	afbf703684	Single-process multi-node CPU training (#9603 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-14 22:21:41 +02:00
Kaushik B	af4a8f1950	Refactor tests for TPU Accelerator (#9718 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-14 19:45:15 +00:00
Danielle Pintz	6feda08109	Deprecate `GPUStatsMonitor` and `XLAStatsMonitor` in favor of `DeviceStatsMonitor` (#9924 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-14 15:52:45 +00:00
four4fish	a002f872ea	[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-14 17:38:22 +02:00
Viraj Bagal	15698698c4	Log LR using LearningRateMonitor even when LR Scheduler is not defined. (#9786 ) * LR logging works even with no lr scheduler, wrote few extra tests as well * updated changelog * modified code as suggested by DeepSource * added helper functions * opt with no scheduler * rename * chlog * update test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-10-14 13:28:19 +00:00
Danielle Pintz	940b910d27	[2/4] Add DeviceStatsMonitor callback (#9712 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-10-13 18:29:36 +00:00
Rohit Gupta	23e8b59ae7	Add `configure_gradient_clipping` hook in `LightningModule` (#9584 ) * init hook * docs * dep train args * update tests * doc * doc * .gitignore * not dep * add trainer args * add & update tests * fix tests * pre-commit * docs * add docs * add exception * code review * deepspeed * update tests * not * try fix * Apply suggestions from code review * update deepspeed * disable some tests * disable some tests * enable all tests	2021-10-13 20:15:13 +05:30
Kaushik B	05b15e63f0	Add `strategy` argument to Trainer (#8597 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-13 12:34:06 +00:00
ananthsub	28fc8d2016	Add `enable_model_summary` flag and deprecate `weights_summary` (#9699 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-10-13 17:20:54 +05:30
Rohit Gupta	0f8fd20443	Remove epoch from `trainer.logged_metrics` (#9904 )	2021-10-13 11:30:27 +02:00
ananthsub	4610fddb19	Mark `Trainer.terminate_on_nan` protected and deprecate public property (#9849 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-12 20:23:22 +00:00
Danielle Pintz	dd6d797e0e	Remove type error handling in _configure_checkpoint_callbacks (#9823 ) * remove type error handling in _configure_checkpoint_callbacks * rm test Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-12 20:13:02 +00:00
Adrian Wälchli	b530b7afd2	update tests to not rely on patched dataloaders (#9905 )	2021-10-12 12:45:28 +02:00
Rohit Gupta	98c0a110e0	Update docs for `GradientAccumulationScheduler` (#9891 ) * update docs and add tests * update docs and add tests * Update pytorch_lightning/callbacks/gradient_accumulation_scheduler.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-12 10:37:16 +00:00
Rohit Gupta	f2b0db60f1	Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured (#9841 ) * add check * chlog * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Apply suggestions from code review Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-12 15:35:55 +05:30
Sean Naren	6da5829e53	DeepSpeed support for device IDs (#9847 )	2021-10-12 09:24:46 +00:00
Rohit Gupta	db322f4bbb	Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` (#9754 ) * enable_chekpointing * update codebase * chlog * update tests * fix warning * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-12 07:55:07 +00:00
Kaushik B	14fb076a30	Fix deprecation test version for accelerator collective (#9892 )	2021-10-12 11:50:31 +05:30
Sean Naren	83acb8671d	Update DeepSpeed version, fix failing tests (#9898 )	2021-10-11 22:35:33 +00:00
yopknopixx	173f4c8466	Deprecate `terminate_on_nan` Trainer argument in favor of `detect_anomaly` (#9175 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-11 17:17:43 +00:00
Adrian Wälchli	6a0c47a014	remove redundant accumulation normalization in manual optimization (#9769 )	2021-10-11 15:26:12 +00:00
Ranuga-Disansa	f915a8a283	Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback (#9875 ) * Update README.md * Update README.md * Create evaluation.py * Update README.md * Update evaluation.py * Create evaluation.py * Create evaluation.py * Update evaluation.py * Create nlp.py * Update evaluation.py * Create evaluation.py * Update nlp.py * Update nlp.py * Update evaluation.py * Create evaluation.py * Update nlp.py * Update nlp.py * Update requirements.txt * Update evaluation.py * Create data_loader.py * Update nlp.py * Update evaluation.py * Update data_loader.py * Update nlp.py * Update data_loader.py * Update requirements.txt * Update model_checkpoint.py * Delete evaluation.py * Delete data_loader.py * Delete nlp.py * Update requirements.txt * Update model_checkpoint.py * Update README.md * Update pytorch_lightning/callbacks/model_checkpoint.py * Update CHANGELOG.md * Update test_model_checkpoint.py * Update model_checkpoint.py * update * update * chlog update Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-11 14:54:07 +00:00
Boris Dayma	2db9ea3500	feat(wandb): support media logging (#9545 )	2021-10-11 10:15:36 +01:00
Rohit Gupta	d71501d97f	Reset `val_dataloader` in `tuner/batch_size_scaling` (#9857 ) * reset val * chlog	2021-10-11 09:13:33 +01:00
kingyiusuen	8740c801bb	Fix typo in _validate_scheduler_optimizer() (#9886 )	2021-10-11 09:16:17 +02:00
ananthsub	5206e52786	Add support for `torch.set_detect_anomaly` (#9848 ) * Add support for `detect_anomaly` * Update CHANGELOG.md	2021-10-07 16:03:56 +00:00
Rohit Gupta	4decbc0d95	Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816 ) * deprecate hooks * dep todo * explicit * Apply suggestions from code review * Apply suggestions from code review * code review * base	2021-10-07 10:18:11 +00:00
Rohit Gupta	8a8ecb8d01	Update the logic to check for accumulation steps with deepspeed (#9826 ) * support_dict * chlog * fix test * epochs	2021-10-06 17:50:10 +01:00
Rohit Gupta	b303b4f895	Fix restoring training state during `trainer.fit` only (#9413 ) * reload state on fit * trainer.state * add test * chlog * revert * review * review * rev and ammend * fix test and logic * update * code review * Apply suggestions from code review * better assertions * better assertions * Apply suggestions from code review * add loop test * Apply suggestions from code review * Split for typing * review comments * review comments * use if_else * code review * code review * code review * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Remove unnecessary pieces from the test * move test Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-06 14:57:40 +00:00
Jirka Borovec	b3e9dff32d	rename callback FineTune arg `round` (#9711 ) * rename CB Tune arg round Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-06 09:39:36 +01:00
Kaushik B	f94faa9cd3	Enable auto parameters tying for TPUs (#9525 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-06 10:16:44 +02:00
Elad Segal	86ad941d06	Fix missing arguments when saving hyperparams from parent class only (#9800 ) * Fix missing arguments when saving hyperparams from parent class only * fix antipattern	2021-10-06 08:32:29 +01:00
Danielle Pintz	3392215ef6	Fix broken `test_cpu_amp_precision_context_manager` (#9809 ) * @RunIf(min_gpus=1) * dtype -> fast_dtype	2021-10-04 12:14:13 +00:00
kingyiusuen	6d530373c0	Add warnings regarding unsupported keys in optim config and OneCycleLR (#9666 ) * Add warnings regarding unsupported keys in optim config and OneCycleLR * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix docstring * Update CHANGELOG.md * Split into two parts * Use difference operator to find extra keys Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-04 08:25:05 +00:00
thomas chaton	5841ca9782	[Feat] Add auto_restart for fault tolerant training (#9722 )	2021-10-01 16:37:17 +00:00

... 3 4 5 6 7 ...

2433 Commits