lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	1dd61e4e35	Extend support for logging a collection (#7771 )	2021-06-01 12:51:50 +01:00
Carlos Mocholí	0dd6d3a798	Avoid adding `None` loss values in `training_epoch_end` (#7772 )	2021-05-31 19:28:28 +00:00
Adrian Wälchli	7e6010fc93	fix info message when max training time reached (#7780 ) * call time_elapsed * elapsed formatting * format * update test * changelog	2021-05-31 14:50:16 +02:00
Carlos Mocholí	d47173bb72	Use typing forward references (#7770 ) * Use typing forward references * Update pytorch_lightning/core/lightning.py	2021-05-31 09:54:28 +02:00
Carlos Mocholí	5f0863e5e5	Organize trainer properties (#7758 ) * Organize trainer properties * Single quote * Double quote	2021-05-30 13:09:01 +02:00
Carlos Mocholí	bc3238be8c	Remove metric tracking from dev debugger (#7759 ) * Remove dev debugger metric tracking * Fix tests * Fix test * Import * Fix tests * Fix test * flake8 * Fix tests	2021-05-30 12:03:42 +02:00
Mauricio Villegas	f6b5e3df57	Added save_config_filename init argument to LightningCLI (#7741 )	2021-05-28 09:30:16 +02:00
Boris Dayma	9097347ea8	feat(wandb): log models as artifacts (#6231 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-27 20:15:02 +02:00
Carlos Mocholí	9304c0df8f	Rename and move Result (#7736 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-27 12:27:52 +00:00
Kaushik B	04dcb1786d	Add `__len__` method to IndexBatchSamplerWrapper (#7681 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-26 18:20:13 +02:00
Carlos Mocholí	311d9fe67e	Always run validation inside the training loop epoch (#7357 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 14:26:48 +02:00
Kaushik B	27eb0035ca	Increase TPU Check timeout (#7706 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-26 01:44:29 +00:00
Carlos Mocholí	d26953c8bc	Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning (#7704 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 00:57:56 +02:00
Xinyao(Alvin) Sun	7e2f7e956b	fix: improve UserWarning message (#7685 ) * fix: improve UserWarning message when both overfit and training dtaloader shuffling are enabled fixes issue: #7656 * chore: update changelog * Polish userwarning msg in pytorch_lightning/trainer/data_loading.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * shuffling typo * Update CHANGELOG.md Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-25 17:35:15 +00:00
Kaushik B	e7057d5898	Add `should_rank_save_checkpoint` property to Training Plugins (#7684 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-25 23:02:05 +05:30
Carlos Mocholí	a1c40f3207	Remove on epoch guard from the should stop validation check (#7701 ) * Remove on epoch guard from the should stop validation check * Formatting	2021-05-25 15:59:42 +01:00
Carlos Mocholí	e2ead9abd7	Refactor some loops code and hook tests (#7682 )	2021-05-25 13:27:54 +02:00
Carlos Mocholí	8ba6304c73	Increment the total batch idx before the accumulation early exit (#7692 ) * Increment the total batch idx before the accumulation early exit * Update CHANGELOG	2021-05-25 10:23:40 +02:00
Carlos Mocholí	8b01497e42	Fix global step update when the epoch is skipped (#7677 ) * Fix global step update when the epoch is skipped * Update CHANGELOG * Move test	2021-05-24 17:36:56 +01:00
Kaushik B	3f460b150a	Move parameter validation specific to TPU Training plugins (#7415 ) * Move parameter validation specific to TPU Training plugins * update docstring	2021-05-24 16:02:01 +00:00
ananthsub	fa41c588f4	Remove ProfilerConnector class (#7654 ) * Remove ProfilerConnector class * Update trainer.py * Update CHANGELOG.md * Update trainer.py * Update trainer.py * tests	2021-05-24 08:58:15 -07:00
Gyeongjae Choi	a54bc5dba3	Fix progress bar print error when called before training (#7674 ) * Check progress bar existence before printing * Add tests for predict_progres_bar * Add tests for progress_bar printing without training * Update changelog	2021-05-24 17:33:28 +02:00
Carlos Mocholí	2103b5efc9	Move sync code from step result to lightning module [6/n] (#7651 )	2021-05-24 13:13:55 +01:00
Xinyao(Alvin) Sun	0c958c5a1f	Fix dataloaders are not reset when tuning the model (#7566 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-24 10:21:45 +02:00
shuyingsunshine21	299f2c481b	FSDP with full state dict (#7487 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * fix version for ddp plugin test * fix * fix * changelog * Update CHANGELOG.md * fsdp with full state dict * fix missing import * modify unitest * fix * fix * fix typo * modify test and add changelog * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * limit max_epoch to 1 for testing * test * fix * update * testing remove special for multi gpu * assert gpu * add assertion for gpu * fix * Re-enable special test, use ModelCheckpoint * Fix paths * Fix path passing * test * test * fix test * fix * pre-commit format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-24 08:11:45 +01:00
Xinyao(Alvin) Sun	01109cdf0c	Fix/mismatched toggle optimizer (#7563 ) * fix: avoid potential mismatched toggling of optimzier Refs #7405 chore: update CHANGELOG [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix: resolve a confict chore: update changelog * feat: add a test that fails in master * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo in tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Polish tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Polish tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix: change placeholder in optimizer_step from positional args to keyword args Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-23 04:30:28 +02:00
shuyingsunshine21	2242423b75	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
Carlos Mocholí	a8d9b5f783	Remove tbptt `self.log` flags and other dead code [5/n] (#7644 )	2021-05-22 01:13:00 +00:00
Carlos Mocholí	33a1f5271f	[2/N] Define dataclasses for progress tracking (#7574 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-05-22 03:09:08 +02:00
Yifu Wang	8d6e2ff7b2	Improve argument validation for validate(), test(), and predict() (#7605 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>	2021-05-21 09:03:16 -07:00
ananthsub	f6d892ac21	[feat] Support custom filesystems in LightningModule.to_torchscript (#7617 ) * [feat] Support custom filesystems in LightningModule.to_torchscript * Update CHANGELOG.md * Update test_torchscript.py * Update test_torchscript.py * Update CHANGELOG.md * Update test_torchscript.py	2021-05-21 11:23:15 +00:00
Carlos Mocholí	e8a46bee15	Remove `Result(minimize)` parameter [4/n] (#7628 )	2021-05-21 12:58:52 +02:00
Carlos Mocholí	603ef2cf7f	Use `trainer.call_hook` in the evaluation loop (#7626 )	2021-05-21 11:54:52 +01:00
Carlos Mocholí	3d4dd28bec	Replace `CallbackHookNameValidator` with `FxValidator` [3/n] (#7627 ) * Refactor FxValidator * Fix tests * Fix tests * Class attribute * Fix tests * Better error message * Fix tests * Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py	2021-05-21 11:54:16 +01:00
i-aki-y	7eafd8eac6	Add run_name argument to the MLFlowLogger constructor (#7622 ) * Add run_name argument to the MLFlowLogger * Update CHANGELOG * Fix unnecessary line * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix style by using yapf * Fix import error when mlflow is not installed * Update CHANGELOG.md * Update tests/loggers/test_mlflow.py Co-authored-by: akiyuki ishikawa <aki.y.ishikwa@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-21 09:17:32 +01:00
ananthsub	94ef17ce77	Update model_checkpoint.py (#7625 )	2021-05-20 23:16:18 +02:00
Andrew Tritt	92cf396de2	Override `broadcast_object_list` for `torch<1.8` (#7592 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 08:29:55 +00:00
Yifu Wang	ed271905cf	Clear predict_progress_bar in ProgressBar.__getstate__ (#7608 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 01:38:49 +00:00
ananthsub	8266b141ba	[feat] Support time-based checkpointing during training (#7515 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-19 22:14:13 +00:00
ananthsub	9f5d4955b6	[1/N] Define dataclasses for progress tracking (#6603 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 21:02:20 +00:00
Carlos Mocholí	901b2bac98	Unify `current_fx_name` and `current_hook_fx_name` [2/n] (#7594 ) * Minor loggger connector cleanup [1/n] * Missing line * Address comments * Rely on validator * Unify `current_fx_name` and `current_hook_fx_name` * Fix test	2021-05-19 20:31:06 +00:00
Carlos Mocholí	dbea5bb710	Add typing to `ModelPruning` callback (#7529 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-19 22:01:42 +02:00
Jan-Henrik Lambrechts	608de6abf4	TensorBoardLogger sub_dir parameter for grouping logs (#6195 ) * fixed a small typo * cleaning up * added sub_dir argument to tensorboard and wrote test * sub dir arg exclusively for tensorboard, linted * resolving merge conflict * resolved merge conflict * resolved merge conflict * resolved merge conflict * resolve merge conflict before revert * resolving merge conflict * reverted to pre-lint * added tensorboard sub_dir test * pep8 formatting * removed sub_dir arg from test_all function: * updated feature description * typo in doc description * updated CHANGELOG * Update pytorch_lightning/loggers/tensorboard.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * swapped argument position * added expandvars tests * added expandvars * removed model init * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * fix failed test * Revert "fix failed test" This reverts commit `50b34c66da`. * add env var to test * fix typo in tests * fix tests * for test consistency * fix typo * fix typo 2 Co-authored-by: Ubuntu <azureuser@devhenrik.evuifrmjd4lepbj4relcwwu5va.ax.internal.cloudapp.net> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2021-05-19 19:50:58 +00:00
ananthsub	b4e28e7169	[feat] Add stronger validation for checkpoint_callback argument (#7539 ) * [feat] Add stronger validation for checkpoint_callback configuration * chlog * Update callback_connector.py * Update test_model_checkpoint.py * Update pytorch_lightning/trainer/connectors/callback_connector.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/connectors/callback_connector.py * Update tests/checkpointing/test_model_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 19:38:08 +00:00
Carlos Mocholí	76ff600898	Minor logger connector cleanup [1/n] (#7590 ) * Minor loggger connector cleanup [1/n] * Missing line * Address comments * Rely on validator	2021-05-19 19:25:32 +00:00
TOKUNAGA Hiroyuki	20f63377f8	Fix the condition for calling update_learning_rates (#7032 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-17 17:20:42 +02:00
Adrian Wälchli	502adbced3	refactor optimizer loop logic for manual and automatic optimization (#7526 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-05-17 14:42:01 +02:00
Kaushik B	bf46730d92	Support TPU Pod Training (n/n) (#7296 )	2021-05-17 11:33:44 +00:00
Nic Eggert	f4f51e0dcf	Add kubeflow cluster environment (#7300 ) * Add kubeflow cluster environment * Add KubeflowEnvironment to docs * Add KubeflowEnvironment to the changelog * break up a long line * Add method to detect kubeflow environment * Select Kubeflow environment when available * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit * task_idx == 0 Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-17 09:05:24 +01:00
Adrian Wälchli	6e6e29af49	remove trainer hidden state \| sanity refactor [2 / n] (#7507 )	2021-05-17 08:57:15 +01:00

1 2 3 4 5 ...

2820 Commits