thomas chaton
19136ac847
[Feat] 2/n Add Fault Tolerant Training to LightningFetcher ( #8891 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 16:32:43 +00:00
Yifu Wang
14f1475c25
Ensure the existence of `DDPPlugin._sync_dir` in `reconciliate_processes` ( #8939 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
2021-08-17 13:47:33 +05:30
Yifu Wang
938a191406
Add a flavor of training_step that takes dataloader_iter as an argument ( #8807 )
...
* Add a flavor of training_step that takes dataloader_iter as an argument
2021-08-16 19:01:09 +00:00
thomas chaton
f0a105bf3e
[bugfix] Resolve lost reference to meta object in `ResultMetricCollection` ( #8932 )
2021-08-16 19:21:03 +02:00
thomas chaton
89156b7039
[1/n] Add LightningFetcher ( #8890 )
2021-08-16 16:02:10 +00:00
Carlos Mocholí
d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 ( #8911 )
2021-08-16 13:34:42 +02:00
Carlos Mocholí
93ab24d1ee
Replace DataLoader sampler once for IPUs ( #8858 )
2021-08-16 11:28:05 +02:00
Justus Schock
1d2f7e20c4
[Bugfix] Detach Loaders after running entrypoint ( #8885 )
...
detach loaders after run
2021-08-16 09:26:38 +02:00
Carlos Mocholí
0aa5cc7b77
Integrate `total_batch_idx` with progress tracking ( #8598 )
2021-08-14 14:08:34 +02:00
Carlos Mocholí
bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` ( #8888 )
...
* Smart handling of `EarlyStopping.check_on_train_epoch_end`
* dummy value
* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí
7d87879350
Fix SWA with a list of learning rates ( #8747 )
...
* Fix swa lrs - needs test
* Add test
* Update CHANGELOG
2021-08-14 08:50:08 +02:00
ananthsub
037a86c873
Remove write_predictions from LightningModule ( #8850 )
...
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
thomas chaton
e060547230
[Bug] Add SharedCycleIteratorState ( #8889 )
2021-08-13 19:06:56 +01:00
Sean Naren
b2973a035e
Introduce CheckpointIO Plugin ( #8743 )
2021-08-13 17:35:31 +01:00
Carlos Mocholí
a1264a6850
Automatic string fixes ( #8886 )
2021-08-13 14:28:14 +00:00
christopherfish
0749c1e7d8
Remove call to deprecated fit_loop ( #8873 )
2021-08-13 10:06:36 +02:00
Adrian Wälchli
4b6aaeeae3
fix plateau scheduler stepping on incomplete epoch ( #8861 )
2021-08-13 01:35:52 +00:00
ananthsub
fec4f283bc
Update DataModule docs following property deprecations ( #8864 )
2021-08-12 10:02:26 -07:00
Stefan Wijnja
c77cd518b5
Fix on_train_batch_end signature and call in ProgressBarBase example ( #8836 )
2021-08-12 12:24:12 +00:00
B. Kerim Tshimanga
24f0124ddd
Deprecate DataModule properties: train_transforms, val_transforms, test_transforms, dims, and size ( #8851 )
...
* Deprecate DataModule properties: train_transforms, val_transforms, test_transforms, dims, and size
2021-08-11 08:52:27 -07:00
ananthsub
b47e3ab7ce
Remove truncated_bptt_steps from Trainer constructor ( #8825 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-11 03:26:01 +00:00
Carlos Mocholí
cb2a8ed1b8
Add `LightningCLI(run=False|True)` ( #8751 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-10 15:01:36 +02:00
Adrian Wälchli
3ef8cd654d
Add warning when `wandb.run` already exists ( #8714 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-10 10:14:48 +02:00
Rio H
a7d207ce0f
Add tests for functions in utilities/data.py ( #8785 )
...
* Add tests for utilities/data.py: test_has_iterable_dataset, test_has_len, test_get_len
2021-08-10 07:39:00 +01:00
ananthsub
c4a1c8ba20
Fix truncated backprop through time when set on LightningModule and not Trainer ( #8804 )
...
* Fix truncated backprop through time set on LightningModule and not Trainer
2021-08-09 21:23:05 -07:00
ananthsub
15f6eca31a
Update callback_connector.py ( #8805 )
2021-08-10 04:16:20 +00:00
Carlos Mocholí
f1cc6e3470
Restructure parsing flow in the `LightningCLI` ( #8721 )
2021-08-09 17:26:53 +02:00
Adrian Wälchli
d41de6c0c2
is-instance check to determine the type of a plugin for teardown decision ( #8741 )
2021-08-09 16:31:53 +02:00
Adrian Wälchli
4d18708e41
rename on_expection -> on_exception ( #8750 )
2021-08-09 16:20:05 +02:00
Adrian Wälchli
87093a3339
remove deprecated sync step argument from WandbLogger ( #8763 )
...
* remove deprecated sync step
* update chlog
2021-08-09 09:45:25 +02:00
edward-io
f3442db3f0
Fix comments for metrics_to_scalars ( #8782 )
...
metrics_to_scalars can return non-float values, such as int or complex, depending on the dtype of the tensor.
2021-08-07 15:33:36 +05:30
Adrian Wälchli
e541803636
remove deprecation of gpu string parsing behavior ( #8770 )
2021-08-06 15:41:03 +00:00
Daniel Stancl
d063059d03
Fix mypy for `utilities.xla_device` ( #8755 )
...
* Fix mypy for utilities.xla_device
* Add explicit type hint for _XLA_AVAILABLE in utilities.imports
2021-08-06 17:01:21 +02:00
Daniel Stancl
0b7f6d9200
Fix mypy for `utilities.memory` ( #8744 )
...
* Fix the majority of mypy issues
* Apply @carmocca's suggestion
* Handle the exception when nvidia-smi not found
* Update get_gpu_memory_map's docstring
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-06 11:21:09 +02:00
Kaushik B
6e781e9055
Minor code health fix for Trainer ( #8761 )
2021-08-06 11:14:24 +02:00
edward-io
8473cf44ec
Remove rank 0 restrictions from logger ( #8608 )
2021-08-06 04:13:56 +00:00
Carlos Mocholí
4928dc5579
Improve SWA docs ( #8717 )
2021-08-05 16:07:50 +00:00
Carlos Mocholí
299e289980
Remove deprecated `on_save_checkpoint` argument ( #8688 )
2021-08-05 16:16:30 +01:00
Jongseob Jeon
1c0786ebb8
fix typo error in docstring of LightningLoggerBase.after_save_checkpoint ( #8737 )
2021-08-05 15:05:12 +00:00
Daniel Stancl
69cd927fc5
Fix mypy typing for `utilities.debugging` ( #8672 )
2021-08-05 13:47:58 +02:00
Daniel Stancl
aacd131414
Fix mypy in `utilities.distributed` ( #8201 )
2021-08-05 09:51:09 +00:00
Binh Tang
efec3d461c
Move logger and profiler finalization to trainer's teardown ( #8685 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-05 10:09:43 +02:00
Adrian Wälchli
963c267646
fix recursive call for `apply_to_collection(include_none=False)` ( #8719 )
2021-08-04 20:31:35 +02:00
Carlos Mocholí
ed13040729
Connect the model to the training type plugin at the start of run ( #8536 )
2021-08-04 17:43:34 +02:00
Thien Tran
052aefc342
WandbLogger to log model topology by default ( #8662 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-04 10:36:57 +00:00
Sean Naren
560a5c3fc5
Add functions to collate deepspeed zero 3 checkpoints ( #8701 )
2021-08-04 09:39:02 +00:00
Caleb Robinson
9ca02f58ae
Fix an import deprecation warning ( #8687 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-03 22:17:28 +00:00
samlurye
f90849cc95
Deprecate LightningModule.summarize() in favor of pl.utilities.model_summary.summarize() ( #8513 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-03 22:08:51 +00:00
Elad Segal
08fba96b6c
Add `batch_size`, `rank_zero_only` arguments for `log_dict` to match `log` ( #8628 )
2021-08-03 22:05:34 +00:00
Sean Naren
49d03f87fe
[docs] Update deepspeed docs, add some more information and link to streamlit ( #8691 )
2021-08-03 16:12:36 +00:00