Commit Graph

1851 Commits

Author SHA1 Message Date
Adrian Wälchli c0bd658354
Remove calls to internal dev debugger in training- and eval loop (#9188) 2021-08-30 17:16:59 +02:00
B. Kerim Tshimanga f79993a705
removing legacy profiler arg (#9178)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-30 09:37:09 +00:00
Ning 1657588f35
deprecate `on_{train/val/test/predict}_dataloader()` from DataHooks (#9098)
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-28 17:27:56 +00:00
B. Kerim Tshimanga c993d0ce33
Make unimplemented dataloader hooks raise `NotImplementedError` (#9161) 2021-08-28 16:07:47 +00:00
Carlos Mocholí 0dfc6a18bd
Call any trainer function from the `LightningCLI` (#7508) 2021-08-28 04:43:14 +00:00
thomas chaton 045c879e08
Fix `self.log(sync_dist=True, reduce_fx={mean,max})` (#9142)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-27 15:40:51 +00:00
ananthsub 86a0cb74a4
Check `max_time` when setting defaults for min/max epochs (#9072)
Co-authored-by: tchaton <thomas@grid.ai>
2021-08-27 15:01:12 +00:00
B. Kerim Tshimanga 811d37b756
Update removal version of argparse_utils.py from 1.4 to 2.0 for backwards compatibility (#9162) 2021-08-27 14:10:28 +02:00
Adrian Wälchli b13749b4ec
add fault-tolerance for global random state in map-style datasets (#8950)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-26 12:13:31 +00:00
Adrian Wälchli 8efdeb2c00
deprecate the TestTubeLogger (#9065)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-26 10:28:14 +00:00
Adrian Wälchli 6592d0e454
generalize closure api in Lightning (#8642)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-08-26 08:36:21 +00:00
Adrian Wälchli 0abd6e94b5
[3 / 3] improvements to saving and loading callback state (#7161)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-26 10:02:49 +02:00
Yi Wang 366fb39d2e
Support post-localSGD in Lightning DDP plugin (#8967)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-26 08:24:49 +01:00
Adrian Wälchli 5ff89a7074
Rename test file from log_dir to test_log_dir (#9105) 2021-08-25 12:48:06 +00:00
Sean Naren bac8b1be81
Add support for CPU AMP autocast (#9084) 2021-08-25 12:18:00 +00:00
Sean Naren e9f4bffe0a
Add validate logic for precision (#9080) 2021-08-24 20:00:09 +00:00
Sean Naren 1bab0a17a9
Fix torch bfloat import version (#9089) 2021-08-24 19:18:12 +00:00
thomas chaton f959b13ab9
3/n inter batch parallelism (#9052)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-24 18:45:54 +00:00
Adrian Wälchli b9443a07b9
[2 / 3] improvements to saving and loading callback state (#7187)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-24 17:35:19 +00:00
Nicki Skafte 81145ca990
Fig logging with log_gpu_memory='min_max' (#9013) 2021-08-24 15:00:59 +02:00
Adrian Wälchli dfae7342cc
sanitize arrays when logging as hyperparameters in TensorBoardLogger (#9031)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-24 13:02:06 +02:00
Sean Naren 1feec8c601
Add bfloat16 support to Lightning Trainer (#9049) 2021-08-24 09:47:21 +00:00
Kaushik B 538e743f17
feat: Add Rich Progress Bar (#8929)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-24 02:40:36 +00:00
ananthsub 1e4d8929fb
Simplify checkpoint connector loading after Checkpoint IO plugin's introduction (#9045)
* Simplify checkpoint connector loading after Checkpoint IO plugins introduction

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-23 13:12:18 -07:00
Yifu Wang 3d5e71a767
Add `ShardedTensor` support in `LightningModule` (#8944)
* Add `ShardedTensor` support in `LightningModule`

Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-08-23 19:59:38 +00:00
thomas chaton 92c7eec966
2/n inter batch parallelism (#9047)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-23 19:30:44 +00:00
Adrian Wälchli 49c52b0d4b
update an outdated error message in DDPPlugin (#9005) 2021-08-23 15:29:07 +00:00
thomas chaton e9ce598f2b
1/n inter batch parallelism (#9020)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-23 13:12:25 +00:00
Ning 2481816490
Deprecate `prepare_data_per_node` flag on Trainer and set it as a property for DataHooks (#8958)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-23 12:43:45 +00:00
ananthsub 930b81f96c
Remove unused rank_zero_deprecation in WandB logger (#9034)
* Remove unused imports in WandB logger and corresponding test
2021-08-22 12:58:48 +01:00
Carlos Mocholí b1a859f312
Remove deprecated `on_{save,load}_checkpoint` signature (#8697)
Co-authored-by: Yifu Wang <yifuwang2012@gmail.com>
2021-08-21 22:48:28 -07:00
Michele Sanna 9ff0c22e43
Handle the case with no queries in `GPUStatsMonitor` (#9014)
Co-authored-by: Michele Sanna <{ID}+{username}@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-21 05:22:33 +02:00
Carlos Mocholí e1442d247e
Always use `trainer.call_hook` (#8498) 2021-08-20 18:22:03 +02:00
Adrian Wälchli ad3f183bc3
convert warning cache usage to rank_zero_only in WandbLogger (#8764) 2021-08-20 10:39:25 +00:00
ananthsub f87b2ef21f
Remove GradInformation module, including from LightningModule hierarchy (#8831)
* Remove GradInformation module from LightningModule hierarchy
2021-08-19 04:19:50 +00:00
Sean Naren c6b6888387
Add DeepSpeed Stage 1 + doc improvements for model parallel (#8974)
* Add stage 1 support + small doc improvements

* Add CHANGELOG.md
2021-08-18 19:40:19 +05:30
Danielle Pintz bd13d392af
Add error handling for all trainer entry points (#8819)
* [lightning] Ensure error handling works different trainer entry points
2021-08-18 02:04:40 +00:00
Adrian Wälchli 522df2b89b
3/n integrate new LightningDataFetcher into loop (#8953)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 21:42:22 +00:00
thomas chaton 19136ac847
[Feat] 2/n Add Fault Tolerant Training to LightningFetcher (#8891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 16:32:43 +00:00
Yifu Wang 938a191406
Add a flavor of training_step that takes dataloader_iter as an argument (#8807)
* Add a flavor of training_step that takes dataloader_iter as an argument
2021-08-16 19:01:09 +00:00
thomas chaton f0a105bf3e
[bugfix] Resolve lost reference to meta object in `ResultMetricCollection` (#8932) 2021-08-16 19:21:03 +02:00
thomas chaton 89156b7039
[1/n] Add LightningFetcher (#8890) 2021-08-16 16:02:10 +00:00
Carlos Mocholí d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911) 2021-08-16 13:34:42 +02:00
Carlos Mocholí 93ab24d1ee
Replace DataLoader sampler once for IPUs (#8858) 2021-08-16 11:28:05 +02:00
Justus Schock 1d2f7e20c4
[Bugfix] Detach Loaders after running entrypoint (#8885)
detach loaders after run
2021-08-16 09:26:38 +02:00
Carlos Mocholí 0aa5cc7b77
Integrate `total_batch_idx` with progress tracking (#8598) 2021-08-14 14:08:34 +02:00
Carlos Mocholí bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` (#8888)
* Smart handling of `EarlyStopping.check_on_train_epoch_end`

* dummy value

* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí 7d87879350
Fix SWA with a list of learning rates (#8747)
* Fix swa lrs - needs test

* Add test

* Update CHANGELOG
2021-08-14 08:50:08 +02:00
ananthsub 037a86c873
Remove write_predictions from LightningModule (#8850)
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
thomas chaton e060547230
[Bug] Add SharedCycleIteratorState (#8889) 2021-08-13 19:06:56 +01:00