lightning

Commit Graph

Author	SHA1	Message	Date
Luis Perez	009e05d14f	[bugfix] Minor improvements to `apply_to_collection` and type signature of `log_dict` (#7851 ) * minor fixeS * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-06-07 09:31:36 +01:00
Adrian Wälchli	cfd01d7f8d	move amp checkpoint state management to precision plugin (#7831 )	2021-06-07 07:45:01 +00:00
Ruotian(RT) Luo	dff1047851	Fix an incorrect CHANGELOG link (#7850 )	2021-06-06 23:57:23 +00:00
Sean Naren	7c7182d334	[IPU] Call accelerator hooks regardless if LM hook overridden 1/n (#7826 ) * Modify API to ensure hooks defined in the accelerator are called as expected * handle step_end in dp * Add changelog * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Add todo and explanation Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-06-04 16:19:08 +00:00
thomas chaton	51d370f4c2	[doc] Move each profiler to its own file + Add missing PyTorchProfiler to the doc (#7822 )	2021-06-04 21:08:29 +05:30
shuyingsunshine21	ca89a7f344	[sharded plugin] Fix check for fp16 precision (#7825 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-06-04 08:34:39 +02:00
Mauricio Villegas	f34584001c	Fix support for torch Module type hints in LightningCLI (#7807 ) * Fixed support for torch Module type hints in LightningCLI * - Fix issue with serializing values when type hint is Any. - Run unit test only on newer torchvision versions in which the base class is Module. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor change * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-06-04 07:43:43 +02:00
Adrian Wälchli	36770b22fd	validate manual optimization and supported features before running training (#7788 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-06-03 08:42:37 -07:00
Ethan Harris	03bb389b21	Fix double precision + ddp_spawn (#6924 ) * Initial fix * Initial fix * Initial fix * Updates * Updates * Update typing and docs * Undo accidental refactor * Remove unused imports * Add DDP double precision test * Remove unused variable * Update CHANGELOG.md * Fix test * Update tests * Formatting * Revert bad change * Add back changes * Correct wrapping order * Improve unwrapping * Correct wrapping order * Fix... finally * Respond to comments * Drop ddp test * Simplify ddp spawn test * Simplify ddp spawn test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-01 15:21:17 +00:00
Carlos Mocholí	195b24ba51	`apply_to_collection` improvements and add `apply_to_collections` (#7769 ) * `apply_to_collection` improvements and add `apply_to_collections` * Update CHANGELOG * Minor fix * Minor fix * Remove attr * Swap is first is None * None test * OrderedDict support * flake8 * Fix docstring	2021-06-01 12:09:20 +00:00
Carlos Mocholí	1dd61e4e35	Extend support for logging a collection (#7771 )	2021-06-01 12:51:50 +01:00
Carlos Mocholí	0dd6d3a798	Avoid adding `None` loss values in `training_epoch_end` (#7772 )	2021-05-31 19:28:28 +00:00
Adrian Wälchli	7e6010fc93	fix info message when max training time reached (#7780 ) * call time_elapsed * elapsed formatting * format * update test * changelog	2021-05-31 14:50:16 +02:00
Mauricio Villegas	f6b5e3df57	Added save_config_filename init argument to LightningCLI (#7741 )	2021-05-28 09:30:16 +02:00
Boris Dayma	9097347ea8	feat(wandb): log models as artifacts (#6231 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-27 20:15:02 +02:00
Carlos Mocholí	9304c0df8f	Rename and move Result (#7736 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-27 12:27:52 +00:00
Carlos Mocholí	906c067b07	Update hooks pseudocode (#7713 )	2021-05-27 12:27:26 +02:00
Kaushik B	04dcb1786d	Add `__len__` method to IndexBatchSamplerWrapper (#7681 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-26 18:20:13 +02:00
Carlos Mocholí	311d9fe67e	Always run validation inside the training loop epoch (#7357 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 14:26:48 +02:00
Carlos Mocholí	d26953c8bc	Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning (#7704 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 00:57:56 +02:00
Xinyao(Alvin) Sun	7e2f7e956b	fix: improve UserWarning message (#7685 ) * fix: improve UserWarning message when both overfit and training dtaloader shuffling are enabled fixes issue: #7656 * chore: update changelog * Polish userwarning msg in pytorch_lightning/trainer/data_loading.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * shuffling typo * Update CHANGELOG.md Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-25 17:35:15 +00:00
Kaushik B	e7057d5898	Add `should_rank_save_checkpoint` property to Training Plugins (#7684 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-25 23:02:05 +05:30
Carlos Mocholí	8ba6304c73	Increment the total batch idx before the accumulation early exit (#7692 ) * Increment the total batch idx before the accumulation early exit * Update CHANGELOG	2021-05-25 10:23:40 +02:00
Jirka Borovec	ad168fc4c6	chlog for 1.3.2 + legacy test (#7676 )	2021-05-24 17:55:02 +00:00
Carlos Mocholí	8b01497e42	Fix global step update when the epoch is skipped (#7677 ) * Fix global step update when the epoch is skipped * Update CHANGELOG * Move test	2021-05-24 17:36:56 +01:00
ananthsub	fa41c588f4	Remove ProfilerConnector class (#7654 ) * Remove ProfilerConnector class * Update trainer.py * Update CHANGELOG.md * Update trainer.py * Update trainer.py * tests	2021-05-24 08:58:15 -07:00
Gyeongjae Choi	a54bc5dba3	Fix progress bar print error when called before training (#7674 ) * Check progress bar existence before printing * Add tests for predict_progres_bar * Add tests for progress_bar printing without training * Update changelog	2021-05-24 17:33:28 +02:00
Xinyao(Alvin) Sun	0c958c5a1f	Fix dataloaders are not reset when tuning the model (#7566 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-24 10:21:45 +02:00
shuyingsunshine21	299f2c481b	FSDP with full state dict (#7487 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * fix version for ddp plugin test * fix * fix * changelog * Update CHANGELOG.md * fsdp with full state dict * fix missing import * modify unitest * fix * fix * fix typo * modify test and add changelog * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * limit max_epoch to 1 for testing * test * fix * update * testing remove special for multi gpu * assert gpu * add assertion for gpu * fix * Re-enable special test, use ModelCheckpoint * Fix paths * Fix path passing * test * test * fix test * fix * pre-commit format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-24 08:11:45 +01:00
Xinyao(Alvin) Sun	01109cdf0c	Fix/mismatched toggle optimizer (#7563 ) * fix: avoid potential mismatched toggling of optimzier Refs #7405 chore: update CHANGELOG [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix: resolve a confict chore: update changelog * feat: add a test that fails in master * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo in tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Polish tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Polish tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix: change placeholder in optimizer_step from positional args to keyword args Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-23 04:30:28 +02:00
shuyingsunshine21	2242423b75	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
Carlos Mocholí	a8d9b5f783	Remove tbptt `self.log` flags and other dead code [5/n] (#7644 )	2021-05-22 01:13:00 +00:00
Carlos Mocholí	33a1f5271f	[2/N] Define dataclasses for progress tracking (#7574 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-05-22 03:09:08 +02:00
ananthsub	f6d892ac21	[feat] Support custom filesystems in LightningModule.to_torchscript (#7617 ) * [feat] Support custom filesystems in LightningModule.to_torchscript * Update CHANGELOG.md * Update test_torchscript.py * Update test_torchscript.py * Update CHANGELOG.md * Update test_torchscript.py	2021-05-21 11:23:15 +00:00
i-aki-y	7eafd8eac6	Add run_name argument to the MLFlowLogger constructor (#7622 ) * Add run_name argument to the MLFlowLogger * Update CHANGELOG * Fix unnecessary line * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix style by using yapf * Fix import error when mlflow is not installed * Update CHANGELOG.md * Update tests/loggers/test_mlflow.py Co-authored-by: akiyuki ishikawa <aki.y.ishikwa@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-21 09:17:32 +01:00
Andrew Tritt	92cf396de2	Override `broadcast_object_list` for `torch<1.8` (#7592 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 08:29:55 +00:00
Yifu Wang	ed271905cf	Clear predict_progress_bar in ProgressBar.__getstate__ (#7608 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 01:38:49 +00:00
ananthsub	8266b141ba	[feat] Support time-based checkpointing during training (#7515 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-19 22:14:13 +00:00
ananthsub	9f5d4955b6	[1/N] Define dataclasses for progress tracking (#6603 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 21:02:20 +00:00
Jan-Henrik Lambrechts	608de6abf4	TensorBoardLogger sub_dir parameter for grouping logs (#6195 ) * fixed a small typo * cleaning up * added sub_dir argument to tensorboard and wrote test * sub dir arg exclusively for tensorboard, linted * resolving merge conflict * resolved merge conflict * resolved merge conflict * resolved merge conflict * resolve merge conflict before revert * resolving merge conflict * reverted to pre-lint * added tensorboard sub_dir test * pep8 formatting * removed sub_dir arg from test_all function: * updated feature description * typo in doc description * updated CHANGELOG * Update pytorch_lightning/loggers/tensorboard.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * swapped argument position * added expandvars tests * added expandvars * removed model init * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * fix failed test * Revert "fix failed test" This reverts commit `50b34c66da`. * add env var to test * fix typo in tests * fix tests * for test consistency * fix typo * fix typo 2 Co-authored-by: Ubuntu <azureuser@devhenrik.evuifrmjd4lepbj4relcwwu5va.ax.internal.cloudapp.net> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2021-05-19 19:50:58 +00:00
ananthsub	b4e28e7169	[feat] Add stronger validation for checkpoint_callback argument (#7539 ) * [feat] Add stronger validation for checkpoint_callback configuration * chlog * Update callback_connector.py * Update test_model_checkpoint.py * Update pytorch_lightning/trainer/connectors/callback_connector.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/connectors/callback_connector.py * Update tests/checkpointing/test_model_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 19:38:08 +00:00
TOKUNAGA Hiroyuki	20f63377f8	Fix the condition for calling update_learning_rates (#7032 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-17 17:20:42 +02:00
Adrian Wälchli	502adbced3	refactor optimizer loop logic for manual and automatic optimization (#7526 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-05-17 14:42:01 +02:00
Nic Eggert	f4f51e0dcf	Add kubeflow cluster environment (#7300 ) * Add kubeflow cluster environment * Add KubeflowEnvironment to docs * Add KubeflowEnvironment to the changelog * break up a long line * Add method to detect kubeflow environment * Select Kubeflow environment when available * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit * task_idx == 0 Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-17 09:05:24 +01:00
Adrian Wälchli	6e6e29af49	remove trainer hidden state \| sanity refactor [2 / n] (#7507 )	2021-05-17 08:57:15 +01:00
Mauricio Villegas	d0081778f8	Enable fsspec by default for cli config file (#7521 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-17 08:53:00 +01:00
Alan Du	6ac16ff348	Fix DistribType for `ddp_cpu` (spawn) (#7492 )	2021-05-14 20:53:26 +01:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
Carlos Mocholí	a584196abf	Default `seed_everything(workers=True)` in the `LightningCLI` (#7504 )	2021-05-13 12:18:03 +02:00
Adrian Wälchli	dd1a17b071	Refactor result handling in training loop (#7506 ) * refactor results * rename dic -> dict * simplify * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix None check * chlog wording * move process_closure_result to the end Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-13 09:30:34 +01:00
Jirka Borovec	298f9e5c2d	Prune deprecated utils modules (#7503 ) * argparse_utils * model_utils * warning_utils * xla_device_utils * chlog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-13 07:24:42 +00:00
Jirka Borovec	946aee0c7b	prune data parallel (#7510 )	2021-05-13 06:23:02 +01:00
Carlos Mocholí	072ad52b6b	Add `trainer.predict(ckpt_path)` (#7430 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-13 01:49:58 +02:00
Jirka Borovec	d4ec75164c	Prune deprecated trainer attributes (#7501 ) * use_single_gpu * use_horovod * use_ddp2 * use_ddp * use_dp * on_gpu * use_tpu * on_tpu * on_cpu * cleaning * chlog * Apply suggestions from code review * Apply suggestions from code review	2021-05-12 20:10:15 +00:00
Jirka Borovec	96981091c7	Prune deprecated classif. metrics (#7499 ) * stat_scores_multiple_classes * precision_recall * precision * recall * auc * auroc * multiclass_auroc * iou * clean-up * chlog * flake8 * imports * prune	2021-05-12 18:03:34 +00:00
Jirka Borovec	140b0c727e	Prune deprecated trainer attributes 2 (#7502 ) * accelerator_backend * get_model * clean * chlog * flake8	2021-05-12 10:19:30 -07:00
Federico Simonetta	8cdbd03d02	MLFlow now uses env variable as default tracking uri (#7457 ) * Clarify logger flag Clarify behavior of boolean values on the logger flag for Trainer. * Update docs/source/common/trainer.rst * doc * MLFlow now uses env variable as default tracking uri Solves https://github.com/PyTorchLightning/pytorch-lightning/issues/6894 * Update pytorch_lightning/loggers/mlflow.py Co-authored-by: thomas chaton <thomas@grid.ai> * changelog Co-authored-by: SpontaneousDuck <kennywitham4@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-05-12 11:26:57 +02:00
shuyingsunshine21	8538c1f61e	Accelerator model state dict (#7474 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * modify model state dict to training type plugin * remove changes * add changelog * fixing isort for pre-commit failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-11 16:39:04 +01:00
Justus Schock	7b283e3c46	Bugfix/Multiple dataloaders (#7433 ) * Update supporters.py * Update apply_func.py * Update supporters.py * Update model_train_dataloaders.py * Update model_train_steps.py * Update test_dataloaders.py * Update CHANGELOG.md * Update model_train_steps.py * Update test_dataloaders.py * Update test_dataloaders.py * Update supporters.py * Update test_supporters.py * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/trainer/test_dataloaders.py Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Apply suggestions from code review Co-authored-by: Edgar Riba <edgar.riba@gmail.com> * Update supporters.py * Update supporters.py * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Edgar Riba <edgar.riba@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-11 16:33:29 +02:00
Jirka Borovec	d7c44cc649	Docs: sync chlog 1.3.1 (#7478 )	2021-05-11 12:44:22 +02:00
ananthsub	fdf50a5e4b	Mark certain Trainer APIs as protected (#7420 )	2021-05-11 11:53:41 +02:00
Adrian Wälchli	ad9118f04a	remove trainer hidden state \| sanity refactor [1 / n] (#7437 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-11 11:09:08 +02:00
David Fidalgo	4a1134db64	Log epoch metrics before firing the `on_evaluation_end` hook (#7272 ) * Log epoch metrics before firing the `on_evaluation_end` hook (addresses #7166) * test that epoch metrics are logged before `on_evaluation_end` hook * update CHANGELOG * Shorter test Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-11 10:54:31 +02:00
Carlos Mocholí	b65ae79478	Automatically check `DataModule.has_{setup,teardown,prepare_data}` [2/2] (#7238 ) * Automatically check `DataModule.has_{setup,teardown,prepare_data}` * Use variable * Spacing * Docs * Update CHANGELOG * Remove `_DataModuleWrapper` * Add test * Update docs/source/extensions/datamodules.rst * Bad merge * add test for invalid name * Remove ValueError Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-11 10:53:00 +02:00
shuyingsunshine21	987530cd38	Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin (#7026 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-08 11:25:51 +00:00
Akihiro Nitta	710b144b9b	Restore `trainer.current_epoch` after tuning (#7434 ) * Add a test * Save and restore current_epoch * Update CHANGELOG * alphabetical order	2021-05-08 07:15:52 +02:00
Ethan Harris	45143fd825	Improve val step logging (#7351 ) * Fix val step logging * Add a type * Fix * Update CHANGELOG.md	2021-05-07 22:58:03 +00:00
ananthsub	f9e050c5e5	Move DP warning suppression to the DataParallel Plugin (#7421 )	2021-05-07 23:02:44 +02:00
ananthsub	fecce50355	Deprecate TrainerModelHooksMixin (#7422 ) * Deprecate TrainerModelHooksMixin * Update CHANGELOG.md * Update model_hooks.py * Update model_hooks.py	2021-05-07 13:19:36 -07:00
Carlos Mocholí	8208c330eb	Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU (#7025 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-07 16:41:39 +00:00
Leonard Lausen	98b94b810c	Fix DeepSpeedPlugin with IterableDataset (#7362 ) * deepspeed add train_micro_batch_size_per_gpu argument * Update naming and doc * Modify to use auto naming convention, add test * Add iterable tests * Fix tests, attempt by mocking * Import correct package * Fix comparison * Set as special test * Remove import * Add Changelog Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-07 10:46:03 +01:00
Jirka Borovec	28103c67c2	show mush go on (#7413 ) * chlog + version * readme * .	2021-05-06 19:06:21 -04:00
Jirka Borovec	fbc8b209f2	update versions (#7409 ) * update versions * chlog * win * str	2021-05-06 20:35:39 +00:00
Jirka Borovec	b181b8c646	release 1.3.0 (#7404 ) * v1.3.0 * ci event * chlog * badge * formatting	2021-05-06 15:05:35 -04:00
Jirka Borovec	d52e0a8f3e	v0.1.3.0rc3 + changelogs (#7388 ) * v0.1.3.0rc3 * spaces * wip * wip * wip * wip * prune * wip * wip * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-06 07:28:10 -04:00
ananthsub	7b45bcfedb	[2/2] Remove outputs from evaluation epoch end hooks (#7338 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * early stop? * fix * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * update * Update training_loop.py * early stop? * fix * Remove outputs from evaluation epoch end hooks * update * Update test_remove_1-5.py * fix lints * Update base.py * rm-outputs * Update evaluation_loop.py * try-save-more-memory * Update trainer.py * Update trainer.py * cache-at-start * Update evaluation_loop.py * Update training_loop.py * Update training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>	2021-05-05 19:50:58 +00:00
Kaushik B	fbcd63aa89	Update changelog for recent releases (#7387 )	2021-05-05 15:25:56 -04:00
ananthsub	6104a6316a	[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * Update training_loop.py * Update test_training_loop.py * early stop? * fix * update tests * Update test_hooks.py * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 17:18:16 +02:00
ananthsub	98670c83a9	Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323 ) * deprecate-tbptt-trainer * Update CHANGELOG.md * Update lightning.py * test * Update lightning.py * Update training_loop.py * Update training_loop.py * Update lightning.py * Update training_loop.py * Update training_loop.py * update docs * Update accelerator.py * Update accelerator.py * more docs * tweaks * chlog * comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 11:21:00 +01:00
Christfried Focke	763a9a9495	Fix Namespace loading in PyYAML 5.4.x (#6673 ) * Fix Namespace loading in PyYAML 5.4.x * Remove OmegaConf reference from PyYAML requirements * Max allowed version for pyyaml Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-04 22:56:11 +00:00
Carlos Mocholí	374ff750f5	Pass `current_epoch`/`global_step` as monitor candidates [1/2] (#7344 ) * Pass `current_epoch`/`global_step` as monitor candidates * Formatting * Fix deprecated test * Update CHANGELOG	2021-05-04 16:05:40 -04:00
Ethan Harris	2a740ebe77	Fix support for dataloader with None batches (#7342 ) * Fix Dataloader None batch * Fix Dataloader None batch * Update CHANGELOG.md * Fix breaking test * Address comments	2021-05-04 12:24:03 +00:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
Adrian Wälchli	a6aa1a0f82	make gpus=str in Trainer consistent with command line parsing of string (#6388 ) * string gpu input * update docs * deprecation warning * Revert "update docs" This reverts commit `c5f3893413`. * deprecation * add changelog * update parser * update warning * implement v1.5 behavior ahead of time * formatting * set accelerator in test to avoid different warning * add warning * remove todo warn * Update pytorch_lightning/utilities/device_parser.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * resolve flake8 Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai>	2021-05-04 09:56:27 +00:00
Boris Dayma	2a20102321	fix(wandb): allow custom init args (#6989 ) * feat(wandb): allow custom init args * style: pep8 * fix: get dict args * refactor: simplify init args * test: test init args * style: pep8 * docs: update CHANGELOG * test: check default resume value * fix: default value of anonymous * fix: respect order of parameters * feat: use look-up table for anonymous * yapf formatting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-04 09:45:36 +00:00
Hemil Desai	82c19e1444	Update LR schedulers only when their corresponding Optimizer is being… (#4868 ) * Update LR schedulers only when their corresponding Optimizer is being used. In the case when optimizer frequencies are specified, the LR scheduler corresponding to a particular optimizer is updated only when that optimizer is being used in the training loop or epoch. * pep8speak fixes * Fix failing tests * Add docs * PR Feedback * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * formatting fix * PR Feedback - part 2 * More PR feedback * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Add typing imports * Stronger tests and fixes related to that * Add more tests plus PR feedback * Make optimizer_freq_cumsum a cached property @cached_property is only available after Python 3.8 so had to do it manually. * Fix tests * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Avoid mutable defaults * Parametrize lr scheduling tests * PR feedback * Apply suggestions from code review * spell * Apply suggestions from code review * flake8 Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-05-04 09:37:40 +00:00
Adrian Wälchli	b780af51be	update test for resume_from_checkpoint on missing file (#7255 )	2021-05-04 09:16:34 +00:00
Daniel Mesejo-León	6da747e775	Deprecate `LightningModule.datamodule` reference in favor of the trainer one (#6929 ) (#7168 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-04 00:01:41 +00:00
Adrian Wälchli	bf1394a472	improve early stopping verbose logging (#6811 )	2021-05-03 20:20:48 +00:00
ananthsub	14c552bb92	[bugfix] Fix dataloading for iterable datasets and limit_train_batches (#7306 ) * bugfix-dataloading * rm-logs * Update CHANGELOG.md * Update test_dataloaders.py * Update test_dataloaders.py * Update training_loop.py * Update test_dataloaders.py * Update CHANGELOG.md * Update CHANGELOG.md * Update test_dataloaders.py * Update training_loop.py * Update training_loop.py * comments * address comments * more tests * Update progress.py * Update test_dataloaders.py * Update test_dataloaders.py * Update training_loop.py * Update training_loop.py * test ckpt fix? * update again	2021-05-03 19:50:26 +01:00
Adrian Wälchli	e0c64f0ef6	Fix Adagrad optimizer not working with DDP/GPU (#7277 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-05-03 03:57:17 +05:30
Kaushik B	490cc57809	Device updates for TPU Pod (#7243 )	2021-04-30 23:14:06 +05:30
thomas chaton	16d6c9828d	[bugfix] Apex never instantiated. (#7274 ) * update * update * update apex * update * update * update * remove test.py * update * update * update on comments * update changelog * update * update * typo	2021-04-30 13:16:28 -04:00
ananthsub	44fd01734c	Move grad_norm to a dedicated utilities file (#7292 ) * rm-grad-norm-mixin * Update grads.py * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docstrings * Update __init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-30 09:19:22 -07:00
ananthsub	e407edba36	[fix] Attach train+val dataloaders to trainer in trainer loop (#7207 ) * Update training_loop.py * Update test_dataloaders.py * changelog * delay reload * go back * comments * Update training_loop.py * Update test_dataloaders.py * Update tests/trainer/test_dataloaders.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-30 09:01:31 -07:00
thomas chaton	80b9ca0e38	[bugfix] Add reloading support using BaseFinetuning (#7253 ) * update * wip * udpate * update * update * update * resolve bug * update on comments * update on comments * update * update * formatting * add comments * update on comments * update * Update pytorch_lightning/callbacks/base.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * update * Typing and minor changes * Refactor * Fix deprecated test * Broken commit * Fix broken commit * flake8 * Update CHANGELOG * update on comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-30 11:14:43 -04:00
Carlos Mocholí	5af086ab9f	Attach data refactor and tuner bugs [4/n] (#7258 ) Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 13:54:58 +00:00
Adrian Wälchli	b9b3fa371f	fix case where an IterableDataset doesn't produce a batch for an epoch (#7294 ) * wip * fix * add test * refactor + test * rm * formatting * update changelog * doc * docstring * remove unused import * Update CHANGELOG.md Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 12:45:55 +00:00
Adrian Wälchli	8232de427a	fix save_hyperparameters(container) if container is empty (#7268 ) * fix * add tests * changelog * fix test	2021-04-30 13:38:42 +01:00
ananthsub	338f5a3311	Remove exp_save_path on the LightningModule (#7266 ) * deprecate-exp-save-path * Update lightning.py * Update CHANGELOG.md * remove-not-deprecate	2021-04-29 17:44:04 -04:00

1 2 3 4 5 ...

824 Commits