lightning

Commit Graph

Author	SHA1	Message	Date
Kaushik B	27eb0035ca	Increase TPU Check timeout (#7706 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-26 01:44:29 +00:00
Carlos Mocholí	d26953c8bc	Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning (#7704 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 00:57:56 +02:00
Xinyao(Alvin) Sun	7e2f7e956b	fix: improve UserWarning message (#7685 ) * fix: improve UserWarning message when both overfit and training dtaloader shuffling are enabled fixes issue: #7656 * chore: update changelog * Polish userwarning msg in pytorch_lightning/trainer/data_loading.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * shuffling typo * Update CHANGELOG.md Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-25 17:35:15 +00:00
Kaushik B	e7057d5898	Add `should_rank_save_checkpoint` property to Training Plugins (#7684 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-25 23:02:05 +05:30
Carlos Mocholí	a1c40f3207	Remove on epoch guard from the should stop validation check (#7701 ) * Remove on epoch guard from the should stop validation check * Formatting	2021-05-25 15:59:42 +01:00
Carlos Mocholí	e2ead9abd7	Refactor some loops code and hook tests (#7682 )	2021-05-25 13:27:54 +02:00
Carlos Mocholí	8ba6304c73	Increment the total batch idx before the accumulation early exit (#7692 ) * Increment the total batch idx before the accumulation early exit * Update CHANGELOG	2021-05-25 10:23:40 +02:00
Carlos Mocholí	8b01497e42	Fix global step update when the epoch is skipped (#7677 ) * Fix global step update when the epoch is skipped * Update CHANGELOG * Move test	2021-05-24 17:36:56 +01:00
Kaushik B	3f460b150a	Move parameter validation specific to TPU Training plugins (#7415 ) * Move parameter validation specific to TPU Training plugins * update docstring	2021-05-24 16:02:01 +00:00
ananthsub	fa41c588f4	Remove ProfilerConnector class (#7654 ) * Remove ProfilerConnector class * Update trainer.py * Update CHANGELOG.md * Update trainer.py * Update trainer.py * tests	2021-05-24 08:58:15 -07:00
Gyeongjae Choi	a54bc5dba3	Fix progress bar print error when called before training (#7674 ) * Check progress bar existence before printing * Add tests for predict_progres_bar * Add tests for progress_bar printing without training * Update changelog	2021-05-24 17:33:28 +02:00
Carlos Mocholí	2103b5efc9	Move sync code from step result to lightning module [6/n] (#7651 )	2021-05-24 13:13:55 +01:00
Xinyao(Alvin) Sun	0c958c5a1f	Fix dataloaders are not reset when tuning the model (#7566 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-24 10:21:45 +02:00
shuyingsunshine21	299f2c481b	FSDP with full state dict (#7487 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * fix version for ddp plugin test * fix * fix * changelog * Update CHANGELOG.md * fsdp with full state dict * fix missing import * modify unitest * fix * fix * fix typo * modify test and add changelog * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * limit max_epoch to 1 for testing * test * fix * update * testing remove special for multi gpu * assert gpu * add assertion for gpu * fix * Re-enable special test, use ModelCheckpoint * Fix paths * Fix path passing * test * test * fix test * fix * pre-commit format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-24 08:11:45 +01:00
Xinyao(Alvin) Sun	01109cdf0c	Fix/mismatched toggle optimizer (#7563 ) * fix: avoid potential mismatched toggling of optimzier Refs #7405 chore: update CHANGELOG [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix: resolve a confict chore: update changelog * feat: add a test that fails in master * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo in tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Polish tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Polish tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix: change placeholder in optimizer_step from positional args to keyword args Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-23 04:30:28 +02:00
shuyingsunshine21	2242423b75	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
Carlos Mocholí	a8d9b5f783	Remove tbptt `self.log` flags and other dead code [5/n] (#7644 )	2021-05-22 01:13:00 +00:00
Carlos Mocholí	33a1f5271f	[2/N] Define dataclasses for progress tracking (#7574 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-05-22 03:09:08 +02:00
Yifu Wang	8d6e2ff7b2	Improve argument validation for validate(), test(), and predict() (#7605 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>	2021-05-21 09:03:16 -07:00
ananthsub	f6d892ac21	[feat] Support custom filesystems in LightningModule.to_torchscript (#7617 ) * [feat] Support custom filesystems in LightningModule.to_torchscript * Update CHANGELOG.md * Update test_torchscript.py * Update test_torchscript.py * Update CHANGELOG.md * Update test_torchscript.py	2021-05-21 11:23:15 +00:00
Carlos Mocholí	e8a46bee15	Remove `Result(minimize)` parameter [4/n] (#7628 )	2021-05-21 12:58:52 +02:00
Carlos Mocholí	603ef2cf7f	Use `trainer.call_hook` in the evaluation loop (#7626 )	2021-05-21 11:54:52 +01:00
Carlos Mocholí	3d4dd28bec	Replace `CallbackHookNameValidator` with `FxValidator` [3/n] (#7627 ) * Refactor FxValidator * Fix tests * Fix tests * Class attribute * Fix tests * Better error message * Fix tests * Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py	2021-05-21 11:54:16 +01:00
i-aki-y	7eafd8eac6	Add run_name argument to the MLFlowLogger constructor (#7622 ) * Add run_name argument to the MLFlowLogger * Update CHANGELOG * Fix unnecessary line * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix style by using yapf * Fix import error when mlflow is not installed * Update CHANGELOG.md * Update tests/loggers/test_mlflow.py Co-authored-by: akiyuki ishikawa <aki.y.ishikwa@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-21 09:17:32 +01:00
ananthsub	94ef17ce77	Update model_checkpoint.py (#7625 )	2021-05-20 23:16:18 +02:00
Andrew Tritt	92cf396de2	Override `broadcast_object_list` for `torch<1.8` (#7592 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 08:29:55 +00:00
Yifu Wang	ed271905cf	Clear predict_progress_bar in ProgressBar.__getstate__ (#7608 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 01:38:49 +00:00
ananthsub	8266b141ba	[feat] Support time-based checkpointing during training (#7515 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-19 22:14:13 +00:00
ananthsub	9f5d4955b6	[1/N] Define dataclasses for progress tracking (#6603 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 21:02:20 +00:00
Carlos Mocholí	901b2bac98	Unify `current_fx_name` and `current_hook_fx_name` [2/n] (#7594 ) * Minor loggger connector cleanup [1/n] * Missing line * Address comments * Rely on validator * Unify `current_fx_name` and `current_hook_fx_name` * Fix test	2021-05-19 20:31:06 +00:00
Carlos Mocholí	dbea5bb710	Add typing to `ModelPruning` callback (#7529 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-19 22:01:42 +02:00
Jan-Henrik Lambrechts	608de6abf4	TensorBoardLogger sub_dir parameter for grouping logs (#6195 ) * fixed a small typo * cleaning up * added sub_dir argument to tensorboard and wrote test * sub dir arg exclusively for tensorboard, linted * resolving merge conflict * resolved merge conflict * resolved merge conflict * resolved merge conflict * resolve merge conflict before revert * resolving merge conflict * reverted to pre-lint * added tensorboard sub_dir test * pep8 formatting * removed sub_dir arg from test_all function: * updated feature description * typo in doc description * updated CHANGELOG * Update pytorch_lightning/loggers/tensorboard.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * swapped argument position * added expandvars tests * added expandvars * removed model init * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * fix failed test * Revert "fix failed test" This reverts commit `50b34c66da`. * add env var to test * fix typo in tests * fix tests * for test consistency * fix typo * fix typo 2 Co-authored-by: Ubuntu <azureuser@devhenrik.evuifrmjd4lepbj4relcwwu5va.ax.internal.cloudapp.net> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2021-05-19 19:50:58 +00:00
ananthsub	b4e28e7169	[feat] Add stronger validation for checkpoint_callback argument (#7539 ) * [feat] Add stronger validation for checkpoint_callback configuration * chlog * Update callback_connector.py * Update test_model_checkpoint.py * Update pytorch_lightning/trainer/connectors/callback_connector.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/connectors/callback_connector.py * Update tests/checkpointing/test_model_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 19:38:08 +00:00
Carlos Mocholí	76ff600898	Minor logger connector cleanup [1/n] (#7590 ) * Minor loggger connector cleanup [1/n] * Missing line * Address comments * Rely on validator	2021-05-19 19:25:32 +00:00
TOKUNAGA Hiroyuki	20f63377f8	Fix the condition for calling update_learning_rates (#7032 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-17 17:20:42 +02:00
Adrian Wälchli	502adbced3	refactor optimizer loop logic for manual and automatic optimization (#7526 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-05-17 14:42:01 +02:00
Kaushik B	bf46730d92	Support TPU Pod Training (n/n) (#7296 )	2021-05-17 11:33:44 +00:00
Nic Eggert	f4f51e0dcf	Add kubeflow cluster environment (#7300 ) * Add kubeflow cluster environment * Add KubeflowEnvironment to docs * Add KubeflowEnvironment to the changelog * break up a long line * Add method to detect kubeflow environment * Select Kubeflow environment when available * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit * task_idx == 0 Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-17 09:05:24 +01:00
Adrian Wälchli	6e6e29af49	remove trainer hidden state \| sanity refactor [2 / n] (#7507 )	2021-05-17 08:57:15 +01:00
Mauricio Villegas	d0081778f8	Enable fsspec by default for cli config file (#7521 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-17 08:53:00 +01:00
Alan Du	6ac16ff348	Fix DistribType for `ddp_cpu` (spawn) (#7492 )	2021-05-14 20:53:26 +01:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
Carlos Mocholí	a584196abf	Default `seed_everything(workers=True)` in the `LightningCLI` (#7504 )	2021-05-13 12:18:03 +02:00
Adrian Wälchli	dd1a17b071	Refactor result handling in training loop (#7506 ) * refactor results * rename dic -> dict * simplify * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix None check * chlog wording * move process_closure_result to the end Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-13 09:30:34 +01:00
Jirka Borovec	298f9e5c2d	Prune deprecated utils modules (#7503 ) * argparse_utils * model_utils * warning_utils * xla_device_utils * chlog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-13 07:24:42 +00:00
Jirka Borovec	946aee0c7b	prune data parallel (#7510 )	2021-05-13 06:23:02 +01:00
Carlos Mocholí	072ad52b6b	Add `trainer.predict(ckpt_path)` (#7430 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-13 01:49:58 +02:00
Jirka Borovec	d4ec75164c	Prune deprecated trainer attributes (#7501 ) * use_single_gpu * use_horovod * use_ddp2 * use_ddp * use_dp * on_gpu * use_tpu * on_tpu * on_cpu * cleaning * chlog * Apply suggestions from code review * Apply suggestions from code review	2021-05-12 20:10:15 +00:00
Jirka Borovec	96981091c7	Prune deprecated classif. metrics (#7499 ) * stat_scores_multiple_classes * precision_recall * precision * recall * auc * auroc * multiclass_auroc * iou * clean-up * chlog * flake8 * imports * prune	2021-05-12 18:03:34 +00:00
Jirka Borovec	140b0c727e	Prune deprecated trainer attributes 2 (#7502 ) * accelerator_backend * get_model * clean * chlog * flake8	2021-05-12 10:19:30 -07:00
Carlos Mocholí	83283fdb20	Fix yapf-isort conflict (#7500 )	2021-05-12 15:44:57 +02:00
Federico Simonetta	8cdbd03d02	MLFlow now uses env variable as default tracking uri (#7457 ) * Clarify logger flag Clarify behavior of boolean values on the logger flag for Trainer. * Update docs/source/common/trainer.rst * doc * MLFlow now uses env variable as default tracking uri Solves https://github.com/PyTorchLightning/pytorch-lightning/issues/6894 * Update pytorch_lightning/loggers/mlflow.py Co-authored-by: thomas chaton <thomas@grid.ai> * changelog Co-authored-by: SpontaneousDuck <kennywitham4@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-05-12 11:26:57 +02:00
Christopher Ehmann	b9a52fa2ef	added stage param to LightningDataModule.setup example (#7483 ) Co-authored-by: Sileadim <christopher@omnius.com>	2021-05-11 23:43:22 +05:30
shuyingsunshine21	8538c1f61e	Accelerator model state dict (#7474 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * modify model state dict to training type plugin * remove changes * add changelog * fixing isort for pre-commit failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-11 16:39:04 +01:00
Justus Schock	7b283e3c46	Bugfix/Multiple dataloaders (#7433 ) * Update supporters.py * Update apply_func.py * Update supporters.py * Update model_train_dataloaders.py * Update model_train_steps.py * Update test_dataloaders.py * Update CHANGELOG.md * Update model_train_steps.py * Update test_dataloaders.py * Update test_dataloaders.py * Update supporters.py * Update test_supporters.py * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/trainer/test_dataloaders.py Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Apply suggestions from code review Co-authored-by: Edgar Riba <edgar.riba@gmail.com> * Update supporters.py * Update supporters.py * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Edgar Riba <edgar.riba@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-11 16:33:29 +02:00
ananthsub	fdf50a5e4b	Mark certain Trainer APIs as protected (#7420 )	2021-05-11 11:53:41 +02:00
Adrian Wälchli	ad9118f04a	remove trainer hidden state \| sanity refactor [1 / n] (#7437 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-11 11:09:08 +02:00
David Fidalgo	4a1134db64	Log epoch metrics before firing the `on_evaluation_end` hook (#7272 ) * Log epoch metrics before firing the `on_evaluation_end` hook (addresses #7166) * test that epoch metrics are logged before `on_evaluation_end` hook * update CHANGELOG * Shorter test Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-11 10:54:31 +02:00
Carlos Mocholí	b65ae79478	Automatically check `DataModule.has_{setup,teardown,prepare_data}` [2/2] (#7238 ) * Automatically check `DataModule.has_{setup,teardown,prepare_data}` * Use variable * Spacing * Docs * Update CHANGELOG * Remove `_DataModuleWrapper` * Add test * Update docs/source/extensions/datamodules.rst * Bad merge * add test for invalid name * Remove ValueError Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-11 10:53:00 +02:00
Adrian Wälchli	6bc616d78f	fix display bug (#7395 )	2021-05-10 11:26:15 +08:00
shuyingsunshine21	987530cd38	Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin (#7026 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-08 11:25:51 +00:00
Akihiro Nitta	710b144b9b	Restore `trainer.current_epoch` after tuning (#7434 ) * Add a test * Save and restore current_epoch * Update CHANGELOG * alphabetical order	2021-05-08 07:15:52 +02:00
Ethan Harris	45143fd825	Improve val step logging (#7351 ) * Fix val step logging * Add a type * Fix * Update CHANGELOG.md	2021-05-07 22:58:03 +00:00
ananthsub	f9e050c5e5	Move DP warning suppression to the DataParallel Plugin (#7421 )	2021-05-07 23:02:44 +02:00
ananthsub	fecce50355	Deprecate TrainerModelHooksMixin (#7422 ) * Deprecate TrainerModelHooksMixin * Update CHANGELOG.md * Update model_hooks.py * Update model_hooks.py	2021-05-07 13:19:36 -07:00
Carlos Mocholí	8208c330eb	Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU (#7025 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-07 16:41:39 +00:00
Carlos Mocholí	9ba76ce60c	Unify `configure_optimizers` docs (#7399 )	2021-05-07 16:10:24 +02:00
Leonard Lausen	98b94b810c	Fix DeepSpeedPlugin with IterableDataset (#7362 ) * deepspeed add train_micro_batch_size_per_gpu argument * Update naming and doc * Modify to use auto naming convention, add test * Add iterable tests * Fix tests, attempt by mocking * Import correct package * Fix comparison * Set as special test * Remove import * Add Changelog Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-07 10:46:03 +01:00
Jirka Borovec	28103c67c2	show mush go on (#7413 ) * chlog + version * readme * .	2021-05-06 19:06:21 -04:00
Jirka Borovec	b181b8c646	release 1.3.0 (#7404 ) * v1.3.0 * ci event * chlog * badge * formatting	2021-05-06 15:05:35 -04:00
Gyeongjae Choi	d9bdc56b6a	Add _gpus_arg_default in argparse_utils for backward compatibility (#7402 )	2021-05-06 13:35:12 +00:00
Jirka Borovec	d52e0a8f3e	v0.1.3.0rc3 + changelogs (#7388 ) * v0.1.3.0rc3 * spaces * wip * wip * wip * wip * prune * wip * wip * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-06 07:28:10 -04:00
Martin Kristiansen	c3fc0313ef	Updating docs and error message: half precision not available on CPU (#7384 ) * Updating docs and error message to specify that half precission not available on CPU * update messages Co-authored-by: Martin Kristiansen <martinkristiansen@sixgill.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-05-06 09:05:50 +00:00
Carlos Mocholí	6ad05d3338	Update `configure_optimizers` docs (#7390 ) * Update `configure_optimizers` docs * Update pytorch_lightning/core/lightning.py	2021-05-06 10:39:01 +02:00
ananthsub	651f93a69f	Add documentation for ways to access all batch outputs for on_train_epoch_end hook (#7389 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 22:18:45 +00:00
ananthsub	7b45bcfedb	[2/2] Remove outputs from evaluation epoch end hooks (#7338 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * early stop? * fix * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * update * Update training_loop.py * early stop? * fix * Remove outputs from evaluation epoch end hooks * update * Update test_remove_1-5.py * fix lints * Update base.py * rm-outputs * Update evaluation_loop.py * try-save-more-memory * Update trainer.py * Update trainer.py * cache-at-start * Update evaluation_loop.py * Update training_loop.py * Update training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>	2021-05-05 19:50:58 +00:00
ananthsub	6104a6316a	[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * Update training_loop.py * Update test_training_loop.py * early stop? * fix * update tests * Update test_hooks.py * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 17:18:16 +02:00
ananthsub	98670c83a9	Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323 ) * deprecate-tbptt-trainer * Update CHANGELOG.md * Update lightning.py * test * Update lightning.py * Update training_loop.py * Update training_loop.py * Update lightning.py * Update training_loop.py * Update training_loop.py * update docs * Update accelerator.py * Update accelerator.py * more docs * tweaks * chlog * comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 11:21:00 +01:00
Kaushik B	e21b7a62d7	Add ddp_find_unused_parameters_false to Registry (#7224 )	2021-05-04 22:40:00 +00:00
Carlos Mocholí	374ff750f5	Pass `current_epoch`/`global_step` as monitor candidates [1/2] (#7344 ) * Pass `current_epoch`/`global_step` as monitor candidates * Formatting * Fix deprecated test * Update CHANGELOG	2021-05-04 16:05:40 -04:00
Ethan Harris	2a740ebe77	Fix support for dataloader with None batches (#7342 ) * Fix Dataloader None batch * Fix Dataloader None batch * Update CHANGELOG.md * Fix breaking test * Address comments	2021-05-04 12:24:03 +00:00
ramonemiliani93	5db832f181	Fix auto scaling mode when calling tune method on trainer. (#7321 ) * Add test for non-existing mode, the test should fail if something different from `power` or `binsearch` is passed. * Add newline. * Apply fix Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/tuner/test_scale_batch_size.py * Update pytorch_lightning/tuner/batch_size_scaling.py Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-04 12:03:51 +00:00
ananthsub	69cf63e2fd	Update trainer.py (#7340 )	2021-05-04 11:11:27 +00:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
Adrian Wälchli	a6aa1a0f82	make gpus=str in Trainer consistent with command line parsing of string (#6388 ) * string gpu input * update docs * deprecation warning * Revert "update docs" This reverts commit `c5f3893413`. * deprecation * add changelog * update parser * update warning * implement v1.5 behavior ahead of time * formatting * set accelerator in test to avoid different warning * add warning * remove todo warn * Update pytorch_lightning/utilities/device_parser.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * resolve flake8 Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai>	2021-05-04 09:56:27 +00:00
Boris Dayma	2a20102321	fix(wandb): allow custom init args (#6989 ) * feat(wandb): allow custom init args * style: pep8 * fix: get dict args * refactor: simplify init args * test: test init args * style: pep8 * docs: update CHANGELOG * test: check default resume value * fix: default value of anonymous * fix: respect order of parameters * feat: use look-up table for anonymous * yapf formatting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-04 09:45:36 +00:00
Hemil Desai	82c19e1444	Update LR schedulers only when their corresponding Optimizer is being… (#4868 ) * Update LR schedulers only when their corresponding Optimizer is being used. In the case when optimizer frequencies are specified, the LR scheduler corresponding to a particular optimizer is updated only when that optimizer is being used in the training loop or epoch. * pep8speak fixes * Fix failing tests * Add docs * PR Feedback * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * formatting fix * PR Feedback - part 2 * More PR feedback * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Add typing imports * Stronger tests and fixes related to that * Add more tests plus PR feedback * Make optimizer_freq_cumsum a cached property @cached_property is only available after Python 3.8 so had to do it manually. * Fix tests * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Avoid mutable defaults * Parametrize lr scheduling tests * PR feedback * Apply suggestions from code review * spell * Apply suggestions from code review * flake8 Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-05-04 09:37:40 +00:00
Carlos Mocholí	3fdb61ac1b	Replace `_DataModuleWrapper` with `__new__` [1/2] (#7289 ) * Remove `_DataModuleWrapper` * Update pytorch_lightning/core/datamodule.py * Update pytorch_lightning/core/datamodule.py * Replace `__reduce__` with `__getstate__`	2021-05-04 08:00:24 +00:00
Leonard Lausen	597b309f2e	Fix `Trainer.plugins` type declaration (#7288 ) * Fix trainer.plugins type declaration * Don't ClusterEnvironment(Plugin) * fix import error, yapf formatter * Add test Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-04 08:42:57 +02:00
SpontaneousDuck	f135debb6a	Clarify logger flag (#7190 ) * Clarify logger flag Clarify behavior of boolean values on the logger flag for Trainer. * Update docs/source/common/trainer.rst * doc Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-05-04 00:21:28 +00:00
Daniel Mesejo-León	6da747e775	Deprecate `LightningModule.datamodule` reference in favor of the trainer one (#6929 ) (#7168 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-04 00:01:41 +00:00
Adrian Wälchli	3e8db4142b	add forgotten test in #7240 (#7283 ) ^	2021-05-03 23:56:30 +00:00
Kaushik B	6d7c6d6403	Update Accelerator Connector for Registry (#7214 )	2021-05-03 21:03:21 +00:00
ananthsub	b7a444883c	Remove model.trainer call inside of dataloading mixin (#7317 ) * Update data_loading.py * Update data_loading.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-03 13:53:54 -07:00
Mauricio Villegas	78a6fd5588	Example and documentation for LightningCLI linking model and data arguments (#7299 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-03 20:45:46 +00:00
Adrian Wälchli	bf1394a472	improve early stopping verbose logging (#6811 )	2021-05-03 20:20:48 +00:00
ananthsub	14c552bb92	[bugfix] Fix dataloading for iterable datasets and limit_train_batches (#7306 ) * bugfix-dataloading * rm-logs * Update CHANGELOG.md * Update test_dataloaders.py * Update test_dataloaders.py * Update training_loop.py * Update test_dataloaders.py * Update CHANGELOG.md * Update CHANGELOG.md * Update test_dataloaders.py * Update training_loop.py * Update training_loop.py * comments * address comments * more tests * Update progress.py * Update test_dataloaders.py * Update test_dataloaders.py * Update training_loop.py * Update training_loop.py * test ckpt fix? * update again	2021-05-03 19:50:26 +01:00
ananthsub	39274273a4	Update accelerator.py (#7318 )	2021-05-03 11:17:26 -04:00
Carlos Mocholí	badd0bba30	Move trainer functions (#7295 )	2021-05-03 09:26:38 -04:00
Adrian Wälchli	e0c64f0ef6	Fix Adagrad optimizer not working with DDP/GPU (#7277 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-05-03 03:57:17 +05:30
Kaushik B	490cc57809	Device updates for TPU Pod (#7243 )	2021-04-30 23:14:06 +05:30
thomas chaton	16d6c9828d	[bugfix] Apex never instantiated. (#7274 ) * update * update * update apex * update * update * update * remove test.py * update * update * update on comments * update changelog * update * update * typo	2021-04-30 13:16:28 -04:00
ananthsub	44fd01734c	Move grad_norm to a dedicated utilities file (#7292 ) * rm-grad-norm-mixin * Update grads.py * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docstrings * Update __init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-30 09:19:22 -07:00
ananthsub	e407edba36	[fix] Attach train+val dataloaders to trainer in trainer loop (#7207 ) * Update training_loop.py * Update test_dataloaders.py * changelog * delay reload * go back * comments * Update training_loop.py * Update test_dataloaders.py * Update tests/trainer/test_dataloaders.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-30 09:01:31 -07:00
thomas chaton	80b9ca0e38	[bugfix] Add reloading support using BaseFinetuning (#7253 ) * update * wip * udpate * update * update * update * resolve bug * update on comments * update on comments * update * update * formatting * add comments * update on comments * update * Update pytorch_lightning/callbacks/base.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * update * Typing and minor changes * Refactor * Fix deprecated test * Broken commit * Fix broken commit * flake8 * Update CHANGELOG * update on comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-30 11:14:43 -04:00
Carlos Mocholí	5af086ab9f	Attach data refactor and tuner bugs [4/n] (#7258 ) Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 13:54:58 +00:00
Adrian Wälchli	ea2287e723	update training type plugin docs regarding result caching (#7261 ) * add docs * typo * update	2021-04-30 13:03:10 +00:00
Adrian Wälchli	b9b3fa371f	fix case where an IterableDataset doesn't produce a batch for an epoch (#7294 ) * wip * fix * add test * refactor + test * rm * formatting * update changelog * doc * docstring * remove unused import * Update CHANGELOG.md Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 12:45:55 +00:00
ananthsub	969e857690	Rename `trainer._launch` to `trainer._run` (#7265 ) * rename-run * fix	2021-04-30 13:39:02 +01:00
Adrian Wälchli	8232de427a	fix save_hyperparameters(container) if container is empty (#7268 ) * fix * add tests * changelog * fix test	2021-04-30 13:38:42 +01:00
Kaushik B	ac92b57e2b	No need of warning when saved callback_states is None (#7293 )	2021-04-30 10:48:53 +00:00
ananthsub	338f5a3311	Remove exp_save_path on the LightningModule (#7266 ) * deprecate-exp-save-path * Update lightning.py * Update CHANGELOG.md * remove-not-deprecate	2021-04-29 17:44:04 -04:00
Adrian Wälchli	b6706470c1	fix fast_dev_run parsing from cli (#7240 )	2021-04-30 01:16:20 +05:30
ananthsub	14b8dd479a	[2/2] Remove training loop force calling early stopping callback (#7069 ) * rebase * doc * Update training_loop.py * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md	2021-04-29 09:14:53 -07:00
Carlos Mocholí	a5ac3f8a16	Code cleaning in preparation for #7258 [3/n] (#7262 )	2021-04-29 14:40:51 +02:00
thomas chaton	848288c8d8	[warning] Add a warning with missing callback with resume_from_checkpoint (#7254 ) * add a warning * add changelog	2021-04-29 12:39:45 +00:00
George	e272bea4dc	Updated `ModelCheckpoint` documentation (#6873 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-28 23:56:58 +00:00
ananthsub	075de9356c	Reset current_fx properties on lightning module in teardown (#7247 ) * Update trainer.py * cleanup module properties in teardown * Update test_trainer.py * Update lightning.py * Formatting * flake8 * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-28 12:17:20 -07:00
Carlos Mocholí	40f80230fe	Remove `trainer.fit` return value [2/n] (#7237 ) * `_fit_impl` refactor and types * Fix return * Remove return docstring * Fixes * Fixes * Remove `trainer.fit` return value * Update CHANGELOG * flake8 * Undo results change * Fix test * Revert changes for a separate PR * flake8	2021-04-28 19:11:32 +01:00
Carlos Mocholí	bdc4272e99	`_launch` refactor and types [1/n] (#7232 )	2021-04-28 17:41:08 +02:00
ananthsub	947d1cb757	[1/2] Add support for early stopping during training epoch end (#6944 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-28 15:18:56 +02:00
Vaibhav Balloli	ccd87cadfc	Changes resume_from_checkpoint warning to error (#7075 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-28 15:03:29 +02:00
Ethan Harris	d123aaa6a1	Update fsspec dependency and remove un-needed code (#7210 ) * Update fsspec dep and remove un-needed code * Remove unused import	2021-04-28 09:10:46 +01:00
Ali Benkassou	cbc6e30b5d	Replace 'step' with 'global_step' (#7244 )	2021-04-28 06:44:11 +00:00
Kaushik B	94fcaaf5d7	Add `debug` flag to TPU Training Plugins (PT_XLA_DEBUG) (#7219 )	2021-04-27 20:34:25 +00:00
thomas chaton	e76ebd640e	[feat] Add BasePredictionWriter 3/3 (#7127 ) * wip * update * update * update * update * update * typo * update on comments * update * update * update * update * update changelog * update * Fix merge * Fix merge * move code * resolve test * add extra test * add an extra test * update on comments * add typing * resolve flake8 * Refactor and Docs * Fix tests * Fix tests * Fix tests * Duplicate * Fix tests * resolve bug * update * update on comments * Update pytorch_lightning/utilities/imports.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/utilities/device_parser.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update on comments * resolve flkae8 * update test * Apply suggestions from code review * update on comments * Update pytorch_lightning/callbacks/prediction_writer.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/callbacks/prediction_writer.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/callbacks/prediction_writer.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * update on comments * update * update on comment * Apply suggestions from code review * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-04-27 20:23:55 +00:00
Kaushik B	c6d9f52cb3	Add a check for TPU Spawn barrrier (#7241 )	2021-04-27 19:45:55 +00:00
thomas chaton	5a113a2f05	[bug/feat] Support parameters_to_ignore in DDP (#7239 ) * update * update * update * update on comments * update	2021-04-27 17:49:32 +00:00
Seongmin Park	7fe8d18477	Do not `shuffle` in `LightningDataModule.from_datasets` for `IterableDataset` (#7053 ) * Expose shuffle argument in LightningDataModule.from_datasets * Add test for DataModule initialization with iterable datasets * Add changelog * Remove trailing whitespace * Add more tests for coverage * Fix sequence dataset coverage * Fix Sequence dataset tests * Directly check whether each passed dataset is an IterableDataset * Expose shuffle argument in LightningDataModule.from_datasets * Add test for DataModule initialization with iterable datasets * Add changelog * Remove trailing whitespace * Add more tests for coverage * Fix sequence dataset coverage * Fix Sequence dataset tests * Directly check whether each passed dataset is an IterableDataset * Fix changelog to reflect review direction * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Fix changelog to reflect review direction (2) * Add suggested braces Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Reuse isinstance check * Merged tests with parametrize. Use mocks Co-authored-by: Seongmin Park <seongmin.park@actionpower.kr> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-27 12:53:49 -04:00
ananthsub	bab7225507	[fix] Add barriers before and after setup hook is run (#7202 ) * Update data_connector.py * move-barrier * Update trainer.py * Update ddp.py * changelog * Spacing Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-27 17:19:43 +01:00
thomas chaton	f920ba29f2	[bugfix] Metric not logged properly in manual optimization (#7228 ) * resolve bug * update changelog * typo * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-04-27 09:16:51 -04:00
thomas chaton	e147127c0e	[feat] Add better support for predict + ddp 2/3 (#7215 ) * wip * update * update * update * update * update * typo * update on comments * update * update * update * update * update changelog * update * Fix merge * Fix merge * move code * resolve test * add extra test * add an extra test * update on comments * add typing * resolve flake8 * Refactor and Docs * Fix tests * Fix tests * Fix tests * Duplicate * Fix tests * resolve bug * update * update on comments * update * update changelog * update * update * remove tpu * resolve flake8 * update on comments * update on comments * update on comment * resolve flake8 * add a cpu test for predict * add None test * update * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-27 08:46:45 -04:00
Carlos Mocholí	ca6c87ffbe	Add back `clip_gradients(model)` (#7231 )	2021-04-27 11:34:02 +00:00
Adrian Wälchli	3b36d81c03	Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling (#7014 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-27 09:51:39 +00:00
Kaushik B	5cf9afa176	Add fairscale install msg for Sharded Plugins (#7213 )	2021-04-27 08:22:44 +00:00
shuyingsunshine21	52a5cee0a7	Set smarter default for DDP sharded for performance optimization (#6937 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-27 04:01:34 +05:30
ananthsub	dd5ec75e48	Deprecate save_function from model checkpoint callback (#7201 ) * Update model_checkpoint.py * Update CHANGELOG.md * fix-tests * deprecate not remove * Update model_checkpoint.py * Update test_remove_1-5.py	2021-04-26 17:55:26 +01:00
Alessio Bonfiglio	ac7d6a35c3	Fix `NeptuneLogger.log_text(step=None)` (#7194 )	2021-04-26 15:28:55 +02:00
Kaushik B	6be0a859db	Update teardown for TPU acc (#7211 )	2021-04-26 13:30:46 +01:00
ananthsub	bc3f08b0e3	[fix] Add barrier to accelerator's teardown (#6814 )	2021-04-26 09:23:29 +00:00
ananthsub	68eac4d948	Enforce Lightning module as source of truth for automatic optimization (#7130 ) * make lightning module source of truth for automatic optimization * Update configuration_validator.py * Update model_connector.py * rm-references * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-26 05:36:26 +00:00
Kaushik B	44d775fccf	Update Error message for ProfileConnector (#7204 ) * Update Error message for ProfileConnector * Update test	2021-04-25 11:37:21 -07:00
ananthsub	31fcd7d0ab	Deprecate write_predictions on the LightningModule (#7066 ) * deprecate-write-predictions * Update CHANGELOG.md * Update test_remove_1-5.py Co-authored-by: thomas chaton <thomas@grid.ai>	2021-04-25 16:54:56 +00:00
ananthsub	b3fe836656	Move metrics_to_scalars to a dedicated utilities file (#7180 ) * rm-trainer-logging * Update CHANGELOG.md * Update metrics.py * Update logging.py * Update metrics.py	2021-04-24 10:25:33 +01:00
thomas chaton	f58865aada	Properly set `LightningModule.device` after model replacement (#7188 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-23 16:36:52 +02:00
Sean Naren	8439aead66	Update FairScale on CI (#7017 ) * Try updating CI to latest fairscale * Update availability of imports.py * Remove some of the fairscale custom ci stuff * Update grad scaler within the new process as reference is incorrect for spawn * Remove fairscale from mocks * Install fairscale 0.3.4 into the base container, remove from extra.txt * Update docs/source/conf.py * Fix import issues * Mock fairscale for docs * Fix DeepSpeed and FairScale to specific versions * Swap back to greater than * extras * Revert "extras" This reverts commit `7353479f` * ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-23 12:37:00 +01:00
Akihiro Nitta	92af363270	Fix `lr_finder` suggesting too high learning rates (#7076 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-23 10:59:40 +00:00
Adrian Wälchli	d534e53ec4	add missing predict docs (#7150 ) * update docs * add datamodule predict * fix docs * typo	2021-04-23 10:38:44 +00:00
Tharindu Hasthika	c502e47abf	Fixed setting of _save_dir when run initiated externally (#7106 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-23 01:14:46 +00:00
Jirka Borovec	f48ac62334	fix pip install (#7170 )	2021-04-22 16:48:11 -04:00
Jirka Borovec	aa7d3dc6cc	Fix `torchmetrics` compatibility (#7131 ) * get_num_classes * tmp * fix one test * fix deprecated tests * fix deprecate * pep8 * deprecate 0.3 * wip * wip * HaCK * brnch * brnch * format * Apply suggestions from code review * prune * rev * mltilabel * Apply suggestions from code review * master * rev * . Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-04-22 20:45:46 +00:00
Jirka Borovec	ef5feac7ba	fix version + yapf (#6999 )	2021-04-22 18:25:51 +00:00
Carlos Mocholí	33066f8fd9	Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` (#7141 ) * Update hooks typing and predict hooks * Update CHANGELOG * Progress * Progress * Add back `on_predict_{start,end}` * Typing and fix * Update tests/trainer/logging_/test_logger_connector.py * Update tests/callbacks/test_lambda_function.py	2021-04-22 10:05:28 -04:00
ananthsub	3f1a08ab00	Fix mypy checks for double precision plugin (#7151 )	2021-04-22 11:29:38 +01:00
thomas chaton	99b9dfa883	[bugfix] Remove warning for distributed values (#7132 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-22 02:14:46 +02:00
Carlos Mocholí	345e9a0245	Fix argparse docs (#7148 )	2021-04-22 02:13:00 +02:00
Sean Naren	ce14565ed9	[FSDP] Move on save checkpoint outside of zero check (#7134 ) * Move on save checkpoint outside of zero check * Remove unnecessary override Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-22 01:54:47 +02:00
ananthsub	2f84459d26	Broadcast dirpath for tighter consistency in model checkpoint callback (#6978 ) * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py	2021-04-21 10:20:27 -07:00
thomas chaton	013756404b	[bugfix] Add set_default_tensor_type to torch.DoubleTensor with precision=64 (#7108 ) * update * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-20 15:25:37 +00:00
thomas chaton	ca21da4f3b	Move save_hyperparameters to its own function (#7119 ) * move hyper_parameters * Update pytorch_lightning/core/lightning.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/utilities/parsing.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve flake8 * update * resolve tests * Update pytorch_lightning/core/lightning.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-20 11:04:35 -04:00
Kaushik B	f168a535ca	Add MpModelWrapper in TPU Spawn (#7045 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-20 13:05:27 +00:00
Akihiro Nitta	0302b8be32	Disable `lr_scheduler.step()` in manual optimization (#6825 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-04-20 13:00:45 +02:00
thomas chaton	9beec26c3e	[bugfix] Add support for CombinedLoader in validation with ddp (#7102 ) * add test * add changelog * resolve flake8 * remove print	2021-04-20 08:22:02 +00:00
Adrian Wälchli	67528c4665	Fix attribute error for _gpus_arg_default loading checkpoint prior to 1.2.8 (#7043 )	2021-04-20 07:34:03 +00:00
Adrian Wälchli	6b15ca95f0	fix logger experiment version in multiple run DDP (#7077 ) * fix * changelog	2021-04-19 17:12:05 +00:00
Adrian Wälchli	d12c6cf2b3	more early stopping options (convergence and divergence threshold) (#6868 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-19 16:49:52 +02:00
Adrian Wälchli	60c1c8fe83	Auto-set `DataLoader.worker_init_fn` with `seed_everything` (#6960 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-04-19 16:28:37 +02:00
Akihiro Nitta	d1529c28a1	Optimization docs (#6907 ) * . * . * Fix link to the section * Fix link to the section * Consistent indent * Update docs * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Add note for optimizer.optimizer * . * Update hooks * Update closure docstring * Update optimizer methods * Update optimizer * Remove manopt + grad clipping (by @flukeskywalker) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-19 10:08:49 -04:00
Adrian Wälchli	2b232d3fbd	fix docs rendering in datamodule (#7064 ) * [docs]: add newline to correctly render Example * whitespace Co-authored-by: Matthew Sarmiento <matthewcs@me.com>	2021-04-19 10:08:09 -04:00
Carlos Mocholí	a5e356adb1	Deprecate `@auto_move_data` in favor of `trainer.predict` (#6993 ) * Deprecated `@auto_move_data` in favor of `trainer.predict` * Update CHANGELOG	2021-04-19 14:53:21 +01:00
Adrian Wälchli	e9fca760ac	Set `DistributedSampler` seed if `seed_everything` was called (#7024 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-19 14:50:31 +01:00
Nicki Skafte	fbee5a86e7	Correctly reset metric objects in self.log (#7055 ) * reset * fix tests * fix tests * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * move logic * chglog * pep8 * Add test * Improve test Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-04-19 14:48:48 +01:00
mlech26l	e61daff5cc	Typo LightningMoule -> LightningModule (#7038 )	2021-04-19 13:48:44 +01:00
Carlos Mocholí	898ec8a94a	Create pytorch_lightning/utilities/types.py (#7048 )	2021-04-19 14:43:16 +02:00
Kaushik B	30b7440e12	TPU Spawn Rank & root device Error (#7074 ) * TPU Spawn Rank Error * Update tpu spawn * Fix root device property for tpu spawn * Update changelog	2021-04-18 23:42:48 +02:00
Kaushik B	97be843226	Better approach to register plugins (#7063 ) * Better approach to register plugins * Add ddp_with_find_unused_parameters_false * Remove unnecessary break * Revert back the ddp commit * Update register override logic * Update register override logic * fix mypy	2021-04-18 11:23:12 +02:00
thomas chaton	7b0b0d2844	update (#7056 )	2021-04-16 21:22:19 +01:00
ananthsub	8bcd169767	[fix] Fix multi-node DDP launch by using local rank instead of global rank for main process (#7061 ) * Update ddp.py * Update CHANGELOG.md	2021-04-16 21:18:54 +01:00
Kaushik B	6a7b4cf5d3	Fix mypy for plugins registry (#7062 )	2021-04-17 01:33:41 +05:30
Adrian Wälchli	3fb8eada34	rc2 (#7057 )	2021-04-16 20:34:14 +02:00
Kaushik B	832a03af7c	Add Training Type Plugins Registry (#6982 ) Co-authored-by: Sean Naren <sean@grid.ai> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-04-16 18:01:56 +05:30
Adrian Wälchli	67d21609c9	Add Trainer max_time argument + Callback (#6823 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-04-16 13:38:57 +02:00
ananthsub	4c07ab5e99	Use PyTorch API logging for Lightning Trainer (#6771 ) * Update trainer.py * Update trainer.py * Update trainer.py	2021-04-16 00:10:34 +02:00
Carlos Mocholí	f29ecbfd90	Typing for accelerators and plugins (#7022 )	2021-04-15 16:48:16 +00:00
ananthsub	f6f81f0430	[fix] Add a cluster environment teardown to clean up environment state (#6942 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-15 16:06:54 +00:00
Mauricio Villegas	f852a4f592	Changed basic_examples to use `LightningCLI` (#6862 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-15 15:01:16 +00:00
Ethan Harris	f645df5e9a	Add typings for evaluation_loop.py and remove some dead code (#7015 )	2021-04-15 07:36:04 +00:00
Edward Brown	5bd3cd5f71	Bugfix/cuda oom detection and handling (#6934 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-15 03:22:11 +02:00
Jirka Borovec	895bea1ad3	rename about (#7002 ) * rename about * . * ..	2021-04-14 18:56:40 -04:00
Adrian Wälchli	d3f73a0a74	Plugin Docs (#6952 ) Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-04-14 20:53:21 +00:00
SpontaneousDuck	dcff5036a8	Use PickleError base class to detect all pickle errors (#6917 ) * Use PickleError base class to detect all pickle errors * Update changelog with #6917 * Add pickle test for torch ScriptModule Co-authored-by: Ken Witham <k.witham@kri.neu.edu> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-04-14 20:24:32 +00:00
shuyingsunshine21	03a73b37bc	Train End Error Handling Fix (#6864 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>	2021-04-14 20:35:42 +02:00
Nicki Skafte	7c5ad1905d	Bugfix for predict progressbar (#6884 ) * gating * tests * pep8 * changelog	2021-04-14 09:50:36 +01:00
CeShine Lee	24d0295ff1	Fix the `gradient_clip_algorithm` has no effect issue. (#6928 )	2021-04-14 14:17:06 +05:30
Adrian Wälchli	33cc9fe138	Clean up environment access in plugins (#6941 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 20:07:40 +02:00
Peng Zhang	89074fa2ad	Fix Multi-GPU join for horovod (#6954 ) * fixjoin * fix join on cpu * fix typo * try to undo horovod skip * undo * Try removing skip * Update CHANGELOG * add back skip for test_horovod_multi_optimizer * Add back skip Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-13 17:44:41 +01:00
Carlos Mocholí	15926b462c	Add SWA warning if not running every epoch (#6987 ) * Add SWA warning if not running every epoch * Typo	2021-04-13 18:34:40 +02:00
Ethan Harris	b9bc77293b	Fix inconsistent outputs in `on__end` and `_end` (#6969 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 15:16:21 +01:00
ananthsub	e891ceb836	Remove evaluation loop legacy dict returns for `*_epoch_end` hooks (#6973 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-13 12:37:54 +01:00
Hinrich B. Winther	b37b58a73e	Fix Checkpoint issue when using Horovod distributed backend (PyTorchLightning#6947) (#6958 ) Co-Authored-By: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-13 09:18:52 +00:00

... 2 3 4 5 6 ...

2959 Commits