Commit Graph

1818 Commits

Author SHA1 Message Date
Adrian Wälchli ad3f183bc3
convert warning cache usage to rank_zero_only in WandbLogger (#8764) 2021-08-20 10:39:25 +00:00
ananthsub f87b2ef21f
Remove GradInformation module, including from LightningModule hierarchy (#8831)
* Remove GradInformation module from LightningModule hierarchy
2021-08-19 04:19:50 +00:00
Sean Naren c6b6888387
Add DeepSpeed Stage 1 + doc improvements for model parallel (#8974)
* Add stage 1 support + small doc improvements

* Add CHANGELOG.md
2021-08-18 19:40:19 +05:30
Danielle Pintz bd13d392af
Add error handling for all trainer entry points (#8819)
* [lightning] Ensure error handling works different trainer entry points
2021-08-18 02:04:40 +00:00
Adrian Wälchli 522df2b89b
3/n integrate new LightningDataFetcher into loop (#8953)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 21:42:22 +00:00
thomas chaton 19136ac847
[Feat] 2/n Add Fault Tolerant Training to LightningFetcher (#8891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 16:32:43 +00:00
Yifu Wang 938a191406
Add a flavor of training_step that takes dataloader_iter as an argument (#8807)
* Add a flavor of training_step that takes dataloader_iter as an argument
2021-08-16 19:01:09 +00:00
thomas chaton f0a105bf3e
[bugfix] Resolve lost reference to meta object in `ResultMetricCollection` (#8932) 2021-08-16 19:21:03 +02:00
thomas chaton 89156b7039
[1/n] Add LightningFetcher (#8890) 2021-08-16 16:02:10 +00:00
Carlos Mocholí d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911) 2021-08-16 13:34:42 +02:00
Carlos Mocholí 93ab24d1ee
Replace DataLoader sampler once for IPUs (#8858) 2021-08-16 11:28:05 +02:00
Justus Schock 1d2f7e20c4
[Bugfix] Detach Loaders after running entrypoint (#8885)
detach loaders after run
2021-08-16 09:26:38 +02:00
Carlos Mocholí 0aa5cc7b77
Integrate `total_batch_idx` with progress tracking (#8598) 2021-08-14 14:08:34 +02:00
Carlos Mocholí bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` (#8888)
* Smart handling of `EarlyStopping.check_on_train_epoch_end`

* dummy value

* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí 7d87879350
Fix SWA with a list of learning rates (#8747)
* Fix swa lrs - needs test

* Add test

* Update CHANGELOG
2021-08-14 08:50:08 +02:00
ananthsub 037a86c873
Remove write_predictions from LightningModule (#8850)
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
thomas chaton e060547230
[Bug] Add SharedCycleIteratorState (#8889) 2021-08-13 19:06:56 +01:00
Sean Naren b2973a035e
Introduce CheckpointIO Plugin (#8743) 2021-08-13 17:35:31 +01:00
Carlos Mocholí a1264a6850
Automatic string fixes (#8886) 2021-08-13 14:28:14 +00:00
Adrian Wälchli 5b143d0264
simplify grad clip tests (#8883)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-13 16:40:20 +05:30
Adrian Wälchli 4b6aaeeae3
fix plateau scheduler stepping on incomplete epoch (#8861) 2021-08-13 01:35:52 +00:00
Adrian Wälchli 5fd157eb21
Reduce flakiness of memory test (#8651)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-12 12:50:45 +00:00
B. Kerim Tshimanga 24f0124ddd
Deprecate DataModule properties: train_transforms, val_transforms, test_transforms, dims, and size (#8851)
* Deprecate DataModule properties: train_transforms, val_transforms, test_transforms, dims, and size
2021-08-11 08:52:27 -07:00
ananthsub b47e3ab7ce
Remove truncated_bptt_steps from Trainer constructor (#8825)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-11 03:26:01 +00:00
Carlos Mocholí cb2a8ed1b8
Add `LightningCLI(run=False|True)` (#8751)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-10 15:01:36 +02:00
Jirka Borovec 3096ab88eb
Tests: fix deprecated TM mape (#8830) 2021-08-10 09:26:05 +00:00
Adrian Wälchli 3ef8cd654d
Add warning when `wandb.run` already exists (#8714)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-10 10:14:48 +02:00
Jirka Borovec 0778ffbe2e
Legacy: simple classif training (#8535)
* simple_classif_training
* fix test
* pt1.6
* automate

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-10 08:13:31 +00:00
Rio H a7d207ce0f
Add tests for functions in utilities/data.py (#8785)
* Add tests for utilities/data.py: test_has_iterable_dataset, test_has_len, test_get_len
2021-08-10 07:39:00 +01:00
ananthsub c4a1c8ba20
Fix truncated backprop through time when set on LightningModule and not Trainer (#8804)
* Fix truncated backprop through time set on LightningModule and not Trainer
2021-08-09 21:23:05 -07:00
Carlos Mocholí f1cc6e3470
Restructure parsing flow in the `LightningCLI` (#8721) 2021-08-09 17:26:53 +02:00
Adrian Wälchli 346cef2c3c
Fix tests for new tensorboard >= 2.6 (#8789) 2021-08-09 15:12:38 +02:00
Adrian Wälchli 87093a3339
remove deprecated sync step argument from WandbLogger (#8763)
* remove deprecated sync step

* update chlog
2021-08-09 09:45:25 +02:00
Adrian Wälchli e541803636
remove deprecation of gpu string parsing behavior (#8770) 2021-08-06 15:41:03 +00:00
edward-io 8473cf44ec
Remove rank 0 restrictions from logger (#8608) 2021-08-06 04:13:56 +00:00
Carlos Mocholí 4928dc5579
Improve SWA docs (#8717) 2021-08-05 16:07:50 +00:00
Carlos Mocholí 299e289980
Remove deprecated `on_save_checkpoint` argument (#8688) 2021-08-05 16:16:30 +01:00
Binh Tang efec3d461c
Move logger and profiler finalization to trainer's teardown (#8685)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-05 10:09:43 +02:00
Adrian Wälchli 963c267646
fix recursive call for `apply_to_collection(include_none=False)` (#8719) 2021-08-04 20:31:35 +02:00
Carlos Mocholí ed13040729
Connect the model to the training type plugin at the start of run (#8536) 2021-08-04 17:43:34 +02:00
Thien Tran 052aefc342
WandbLogger to log model topology by default (#8662)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-04 10:36:57 +00:00
Sean Naren 560a5c3fc5
Add functions to collate deepspeed zero 3 checkpoints (#8701) 2021-08-04 09:39:02 +00:00
samlurye f90849cc95
Deprecate LightningModule.summarize() in favor of pl.utilities.model_summary.summarize() (#8513)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-03 22:08:51 +00:00
Jirka Borovec 0e6ee9c39d
CI: add mdformat (#8673)
* add mdformat
* exclude chlog
* fix ***

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-03 18:19:09 +00:00
Isaac 8274183bf2
Add check for unique device ids (#8666)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-03 08:18:51 +00:00
Sean Naren e5d9e21dea
Fix save/load/resume from checkpoint for DeepSpeed Plugin (#8397) 2021-08-02 22:31:05 +00:00
Kaushik B d01d8334b5
Fix `ddp` accelerator choice for cpu (#8645)
* Fix ddp accelerator choice for cpu
2021-08-02 21:24:07 +00:00
thomas chaton dd8216a6b8
Save the `ResultCollection` in the loops state dict (#8641)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-02 20:52:24 +00:00
thomas chaton 567e905ead
update logic to inject FastForwardSampler / CaptureIterableDataset 2/n (#8366)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-02 20:52:06 +00:00
thomas chaton 15fb32037d
Test `metric_attribute` for different children module structures (#8675)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-02 20:51:15 +01:00