Commit Graph

3146 Commits

Author SHA1 Message Date
thomas chaton 19136ac847
[Feat] 2/n Add Fault Tolerant Training to LightningFetcher (#8891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 16:32:43 +00:00
Yifu Wang 14f1475c25
Ensure the existence of `DDPPlugin._sync_dir` in `reconciliate_processes` (#8939)
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
2021-08-17 13:47:33 +05:30
Yifu Wang 938a191406
Add a flavor of training_step that takes dataloader_iter as an argument (#8807)
* Add a flavor of training_step that takes dataloader_iter as an argument
2021-08-16 19:01:09 +00:00
thomas chaton f0a105bf3e
[bugfix] Resolve lost reference to meta object in `ResultMetricCollection` (#8932) 2021-08-16 19:21:03 +02:00
thomas chaton 89156b7039
[1/n] Add LightningFetcher (#8890) 2021-08-16 16:02:10 +00:00
Carlos Mocholí d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911) 2021-08-16 13:34:42 +02:00
Carlos Mocholí 93ab24d1ee
Replace DataLoader sampler once for IPUs (#8858) 2021-08-16 11:28:05 +02:00
Justus Schock 1d2f7e20c4
[Bugfix] Detach Loaders after running entrypoint (#8885)
detach loaders after run
2021-08-16 09:26:38 +02:00
Carlos Mocholí 0aa5cc7b77
Integrate `total_batch_idx` with progress tracking (#8598) 2021-08-14 14:08:34 +02:00
Carlos Mocholí bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` (#8888)
* Smart handling of `EarlyStopping.check_on_train_epoch_end`

* dummy value

* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí 7d87879350
Fix SWA with a list of learning rates (#8747)
* Fix swa lrs - needs test

* Add test

* Update CHANGELOG
2021-08-14 08:50:08 +02:00
ananthsub 037a86c873
Remove write_predictions from LightningModule (#8850)
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
thomas chaton e060547230
[Bug] Add SharedCycleIteratorState (#8889) 2021-08-13 19:06:56 +01:00
Sean Naren b2973a035e
Introduce CheckpointIO Plugin (#8743) 2021-08-13 17:35:31 +01:00
Carlos Mocholí a1264a6850
Automatic string fixes (#8886) 2021-08-13 14:28:14 +00:00
christopherfish 0749c1e7d8
Remove call to deprecated fit_loop (#8873) 2021-08-13 10:06:36 +02:00
Adrian Wälchli 4b6aaeeae3
fix plateau scheduler stepping on incomplete epoch (#8861) 2021-08-13 01:35:52 +00:00
ananthsub fec4f283bc
Update DataModule docs following property deprecations (#8864) 2021-08-12 10:02:26 -07:00
Stefan Wijnja c77cd518b5
Fix on_train_batch_end signature and call in ProgressBarBase example (#8836) 2021-08-12 12:24:12 +00:00
B. Kerim Tshimanga 24f0124ddd
Deprecate DataModule properties: train_transforms, val_transforms, test_transforms, dims, and size (#8851)
* Deprecate DataModule properties: train_transforms, val_transforms, test_transforms, dims, and size
2021-08-11 08:52:27 -07:00
ananthsub b47e3ab7ce
Remove truncated_bptt_steps from Trainer constructor (#8825)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-11 03:26:01 +00:00
Carlos Mocholí cb2a8ed1b8
Add `LightningCLI(run=False|True)` (#8751)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-10 15:01:36 +02:00
Adrian Wälchli 3ef8cd654d
Add warning when `wandb.run` already exists (#8714)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-10 10:14:48 +02:00
Rio H a7d207ce0f
Add tests for functions in utilities/data.py (#8785)
* Add tests for utilities/data.py: test_has_iterable_dataset, test_has_len, test_get_len
2021-08-10 07:39:00 +01:00
ananthsub c4a1c8ba20
Fix truncated backprop through time when set on LightningModule and not Trainer (#8804)
* Fix truncated backprop through time set on LightningModule and not Trainer
2021-08-09 21:23:05 -07:00
ananthsub 15f6eca31a
Update callback_connector.py (#8805) 2021-08-10 04:16:20 +00:00
Carlos Mocholí f1cc6e3470
Restructure parsing flow in the `LightningCLI` (#8721) 2021-08-09 17:26:53 +02:00
Adrian Wälchli d41de6c0c2
is-instance check to determine the type of a plugin for teardown decision (#8741) 2021-08-09 16:31:53 +02:00
Adrian Wälchli 4d18708e41
rename on_expection -> on_exception (#8750) 2021-08-09 16:20:05 +02:00
Adrian Wälchli 87093a3339
remove deprecated sync step argument from WandbLogger (#8763)
* remove deprecated sync step

* update chlog
2021-08-09 09:45:25 +02:00
edward-io f3442db3f0
Fix comments for metrics_to_scalars (#8782)
metrics_to_scalars can return non-float values, such as int or complex, depending on the dtype of the tensor.
2021-08-07 15:33:36 +05:30
Adrian Wälchli e541803636
remove deprecation of gpu string parsing behavior (#8770) 2021-08-06 15:41:03 +00:00
Daniel Stancl d063059d03
Fix mypy for `utilities.xla_device` (#8755)
* Fix mypy for utilities.xla_device

* Add explicit type hint for _XLA_AVAILABLE in utilities.imports
2021-08-06 17:01:21 +02:00
Daniel Stancl 0b7f6d9200
Fix mypy for `utilities.memory` (#8744)
* Fix the majority of mypy issues

* Apply @carmocca's suggestion

* Handle the exception when nvidia-smi not found

* Update get_gpu_memory_map's docstring

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-06 11:21:09 +02:00
Kaushik B 6e781e9055
Minor code health fix for Trainer (#8761) 2021-08-06 11:14:24 +02:00
edward-io 8473cf44ec
Remove rank 0 restrictions from logger (#8608) 2021-08-06 04:13:56 +00:00
Carlos Mocholí 4928dc5579
Improve SWA docs (#8717) 2021-08-05 16:07:50 +00:00
Carlos Mocholí 299e289980
Remove deprecated `on_save_checkpoint` argument (#8688) 2021-08-05 16:16:30 +01:00
Jongseob Jeon 1c0786ebb8
fix typo error in docstring of LightningLoggerBase.after_save_checkpoint (#8737) 2021-08-05 15:05:12 +00:00
Daniel Stancl 69cd927fc5
Fix mypy typing for `utilities.debugging` (#8672) 2021-08-05 13:47:58 +02:00
Daniel Stancl aacd131414
Fix mypy in `utilities.distributed` (#8201) 2021-08-05 09:51:09 +00:00
Binh Tang efec3d461c
Move logger and profiler finalization to trainer's teardown (#8685)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-05 10:09:43 +02:00
Adrian Wälchli 963c267646
fix recursive call for `apply_to_collection(include_none=False)` (#8719) 2021-08-04 20:31:35 +02:00
Carlos Mocholí ed13040729
Connect the model to the training type plugin at the start of run (#8536) 2021-08-04 17:43:34 +02:00
Thien Tran 052aefc342
WandbLogger to log model topology by default (#8662)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-04 10:36:57 +00:00
Sean Naren 560a5c3fc5
Add functions to collate deepspeed zero 3 checkpoints (#8701) 2021-08-04 09:39:02 +00:00
Caleb Robinson 9ca02f58ae
Fix an import deprecation warning (#8687)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-03 22:17:28 +00:00
samlurye f90849cc95
Deprecate LightningModule.summarize() in favor of pl.utilities.model_summary.summarize() (#8513)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-03 22:08:51 +00:00
Elad Segal 08fba96b6c
Add `batch_size`, `rank_zero_only` arguments for `log_dict` to match `log` (#8628) 2021-08-03 22:05:34 +00:00
Sean Naren 49d03f87fe
[docs] Update deepspeed docs, add some more information and link to streamlit (#8691) 2021-08-03 16:12:36 +00:00