Adrian Wälchli
c0bd658354
Remove calls to internal dev debugger in training- and eval loop ( #9188 )
2021-08-30 17:16:59 +02:00
B. Kerim Tshimanga
f79993a705
removing legacy profiler arg ( #9178 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-30 09:37:09 +00:00
Ning
1657588f35
deprecate `on_{train/val/test/predict}_dataloader()` from DataHooks ( #9098 )
...
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-28 17:27:56 +00:00
B. Kerim Tshimanga
c993d0ce33
Make unimplemented dataloader hooks raise `NotImplementedError` ( #9161 )
2021-08-28 16:07:47 +00:00
Carlos Mocholí
0dfc6a18bd
Call any trainer function from the `LightningCLI` ( #7508 )
2021-08-28 04:43:14 +00:00
thomas chaton
045c879e08
Fix `self.log(sync_dist=True, reduce_fx={mean,max})` ( #9142 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-27 15:40:51 +00:00
ananthsub
86a0cb74a4
Check `max_time` when setting defaults for min/max epochs ( #9072 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-08-27 15:01:12 +00:00
B. Kerim Tshimanga
811d37b756
Update removal version of argparse_utils.py from 1.4 to 2.0 for backwards compatibility ( #9162 )
2021-08-27 14:10:28 +02:00
Adrian Wälchli
b13749b4ec
add fault-tolerance for global random state in map-style datasets ( #8950 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-26 12:13:31 +00:00
Adrian Wälchli
8efdeb2c00
deprecate the TestTubeLogger ( #9065 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-26 10:28:14 +00:00
Adrian Wälchli
6592d0e454
generalize closure api in Lightning ( #8642 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-08-26 08:36:21 +00:00
Adrian Wälchli
0abd6e94b5
[3 / 3] improvements to saving and loading callback state ( #7161 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-26 10:02:49 +02:00
Yi Wang
366fb39d2e
Support post-localSGD in Lightning DDP plugin ( #8967 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-26 08:24:49 +01:00
Adrian Wälchli
5ff89a7074
Rename test file from log_dir to test_log_dir ( #9105 )
2021-08-25 12:48:06 +00:00
Sean Naren
bac8b1be81
Add support for CPU AMP autocast ( #9084 )
2021-08-25 12:18:00 +00:00
Sean Naren
e9f4bffe0a
Add validate logic for precision ( #9080 )
2021-08-24 20:00:09 +00:00
Sean Naren
1bab0a17a9
Fix torch bfloat import version ( #9089 )
2021-08-24 19:18:12 +00:00
thomas chaton
f959b13ab9
3/n inter batch parallelism ( #9052 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-24 18:45:54 +00:00
Adrian Wälchli
b9443a07b9
[2 / 3] improvements to saving and loading callback state ( #7187 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-24 17:35:19 +00:00
Nicki Skafte
81145ca990
Fig logging with log_gpu_memory='min_max' ( #9013 )
2021-08-24 15:00:59 +02:00
Adrian Wälchli
dfae7342cc
sanitize arrays when logging as hyperparameters in TensorBoardLogger ( #9031 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-24 13:02:06 +02:00
Sean Naren
1feec8c601
Add bfloat16 support to Lightning Trainer ( #9049 )
2021-08-24 09:47:21 +00:00
Kaushik B
538e743f17
feat: Add Rich Progress Bar ( #8929 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-24 02:40:36 +00:00
ananthsub
1e4d8929fb
Simplify checkpoint connector loading after Checkpoint IO plugin's introduction ( #9045 )
...
* Simplify checkpoint connector loading after Checkpoint IO plugins introduction
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-23 13:12:18 -07:00
Yifu Wang
3d5e71a767
Add `ShardedTensor` support in `LightningModule` ( #8944 )
...
* Add `ShardedTensor` support in `LightningModule`
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-08-23 19:59:38 +00:00
thomas chaton
92c7eec966
2/n inter batch parallelism ( #9047 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-23 19:30:44 +00:00
Adrian Wälchli
49c52b0d4b
update an outdated error message in DDPPlugin ( #9005 )
2021-08-23 15:29:07 +00:00
thomas chaton
e9ce598f2b
1/n inter batch parallelism ( #9020 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-23 13:12:25 +00:00
Ning
2481816490
Deprecate `prepare_data_per_node` flag on Trainer and set it as a property for DataHooks ( #8958 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-23 12:43:45 +00:00
ananthsub
930b81f96c
Remove unused rank_zero_deprecation in WandB logger ( #9034 )
...
* Remove unused imports in WandB logger and corresponding test
2021-08-22 12:58:48 +01:00
Carlos Mocholí
b1a859f312
Remove deprecated `on_{save,load}_checkpoint` signature ( #8697 )
...
Co-authored-by: Yifu Wang <yifuwang2012@gmail.com>
2021-08-21 22:48:28 -07:00
Michele Sanna
9ff0c22e43
Handle the case with no queries in `GPUStatsMonitor` ( #9014 )
...
Co-authored-by: Michele Sanna <{ID}+{username}@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-21 05:22:33 +02:00
Carlos Mocholí
e1442d247e
Always use `trainer.call_hook` ( #8498 )
2021-08-20 18:22:03 +02:00
Adrian Wälchli
ad3f183bc3
convert warning cache usage to rank_zero_only in WandbLogger ( #8764 )
2021-08-20 10:39:25 +00:00
ananthsub
f87b2ef21f
Remove GradInformation module, including from LightningModule hierarchy ( #8831 )
...
* Remove GradInformation module from LightningModule hierarchy
2021-08-19 04:19:50 +00:00
Sean Naren
c6b6888387
Add DeepSpeed Stage 1 + doc improvements for model parallel ( #8974 )
...
* Add stage 1 support + small doc improvements
* Add CHANGELOG.md
2021-08-18 19:40:19 +05:30
Danielle Pintz
bd13d392af
Add error handling for all trainer entry points ( #8819 )
...
* [lightning] Ensure error handling works different trainer entry points
2021-08-18 02:04:40 +00:00
Adrian Wälchli
522df2b89b
3/n integrate new LightningDataFetcher into loop ( #8953 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 21:42:22 +00:00
thomas chaton
19136ac847
[Feat] 2/n Add Fault Tolerant Training to LightningFetcher ( #8891 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 16:32:43 +00:00
Yifu Wang
938a191406
Add a flavor of training_step that takes dataloader_iter as an argument ( #8807 )
...
* Add a flavor of training_step that takes dataloader_iter as an argument
2021-08-16 19:01:09 +00:00
thomas chaton
f0a105bf3e
[bugfix] Resolve lost reference to meta object in `ResultMetricCollection` ( #8932 )
2021-08-16 19:21:03 +02:00
thomas chaton
89156b7039
[1/n] Add LightningFetcher ( #8890 )
2021-08-16 16:02:10 +00:00
Carlos Mocholí
d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 ( #8911 )
2021-08-16 13:34:42 +02:00
Carlos Mocholí
93ab24d1ee
Replace DataLoader sampler once for IPUs ( #8858 )
2021-08-16 11:28:05 +02:00
Justus Schock
1d2f7e20c4
[Bugfix] Detach Loaders after running entrypoint ( #8885 )
...
detach loaders after run
2021-08-16 09:26:38 +02:00
Carlos Mocholí
0aa5cc7b77
Integrate `total_batch_idx` with progress tracking ( #8598 )
2021-08-14 14:08:34 +02:00
Carlos Mocholí
bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` ( #8888 )
...
* Smart handling of `EarlyStopping.check_on_train_epoch_end`
* dummy value
* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí
7d87879350
Fix SWA with a list of learning rates ( #8747 )
...
* Fix swa lrs - needs test
* Add test
* Update CHANGELOG
2021-08-14 08:50:08 +02:00
ananthsub
037a86c873
Remove write_predictions from LightningModule ( #8850 )
...
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
thomas chaton
e060547230
[Bug] Add SharedCycleIteratorState ( #8889 )
2021-08-13 19:06:56 +01:00