Commit Graph

5384 Commits

Author SHA1 Message Date
Sean Naren 49df107bdd
[docs] Update FSDP instructions and add DeepSpeed evaluate/predict example (#8713) 2021-08-04 15:21:30 +00:00
Thien Tran 052aefc342
WandbLogger to log model topology by default (#8662)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-04 10:36:57 +00:00
Sean Naren 560a5c3fc5
Add functions to collate deepspeed zero 3 checkpoints (#8701) 2021-08-04 09:39:02 +00:00
Caleb Robinson 9ca02f58ae
Fix an import deprecation warning (#8687)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-03 22:17:28 +00:00
samlurye f90849cc95
Deprecate LightningModule.summarize() in favor of pl.utilities.model_summary.summarize() (#8513)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-03 22:08:51 +00:00
Elad Segal 08fba96b6c
Add `batch_size`, `rank_zero_only` arguments for `log_dict` to match `log` (#8628) 2021-08-03 22:05:34 +00:00
Sean Naren 98319f83bf
Reduce title length (#8709) 2021-08-03 23:17:10 +02:00
Jirka Borovec 0e6ee9c39d
CI: add mdformat (#8673)
* add mdformat
* exclude chlog
* fix ***

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-03 18:19:09 +00:00
Sean Naren 49d03f87fe
[docs] Update deepspeed docs, add some more information and link to streamlit (#8691) 2021-08-03 16:12:36 +00:00
Sean Naren a1be6217ce
Expand the use cases, move them up for discoverability (#8692) 2021-08-03 11:47:20 +00:00
Daniel Stancl 08ac079c2f
Fix mypy typing for `utilities.cloud_io.py` (#8671)
Co-authored-by: tchaton <thomas@grid.ai>
2021-08-03 11:56:28 +02:00
Isaac 8274183bf2
Add check for unique device ids (#8666)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-03 08:18:51 +00:00
Sean Naren e5d9e21dea
Fix save/load/resume from checkpoint for DeepSpeed Plugin (#8397) 2021-08-02 22:31:05 +00:00
Kaushik B d01d8334b5
Fix `ddp` accelerator choice for cpu (#8645)
* Fix ddp accelerator choice for cpu
2021-08-02 21:24:07 +00:00
thomas chaton dd8216a6b8
Save the `ResultCollection` in the loops state dict (#8641)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-02 20:52:24 +00:00
thomas chaton 567e905ead
update logic to inject FastForwardSampler / CaptureIterableDataset 2/n (#8366)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-02 20:52:06 +00:00
thomas chaton 15fb32037d
Test `metric_attribute` for different children module structures (#8675)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-02 20:51:15 +01:00
thomas chaton 9e61de2063
Torch Elastic DDP DeadLock bug fix (#8655)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-02 21:48:43 +02:00
Carlos Mocholí d83dd7969d
Disable recurrent events on forks (#8668) 2021-08-02 18:12:13 +00:00
Jirka Borovec 661522e173
black: magic trailing comma (#8560) 2021-08-02 20:02:36 +02:00
Carlos Mocholí ca96b2d23e
Delete deprecated save function (#8680) 2021-08-02 19:28:31 +02:00
Jirka Borovec f67892ea96
CI: yesqa (#8564)
* add yesqa
* fix flake8

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-02 16:05:56 +00:00
Jirka Borovec 66cc505339
update NGC (#8652)
* update NGC

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-02 16:05:36 +00:00
Carlos Mocholí cf0d362658
Delete deprecated `TrainerTrainingTricksMixin` (#8679) 2021-08-02 18:00:32 +02:00
Carlos Mocholí d187008e84
Un-skip some Horovod tests (#8676) 2021-08-02 17:54:05 +02:00
Kaushik B 850416f0a0
Fix distributed types support for CPUs (#8667) 2021-08-02 16:42:28 +05:30
thomas chaton 85bba06529
update (#8674) 2021-08-02 11:56:09 +02:00
Sean Naren 7a1e97203e
Add property to skip restoring optimizers and schedulers via plugin (#8644) 2021-07-31 10:08:10 +02:00
Daniel Stancl 1f01db8b30
Fix mypy in utilities.argparse (#8124)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-30 16:36:55 +00:00
Adrian Wälchli 16392a7de7
Update links for `zero_grad` to PyTorch docs (#8618) 2021-07-30 16:09:36 +02:00
Wei Ji a78709751a
Reverse width, height to height, width in docs (#8612)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-30 13:56:17 +00:00
Rio H ba8053492f
Deprecate LightningModule.model_size (#8495)
Co-authored-by: Caleb Robinson <calebrob6@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-30 13:53:40 +00:00
Adrian Wälchli 529c42f848
fix collecting training_step outputs (#8613) 2021-07-30 13:03:15 +00:00
Carlos Mocholí 5789e9f5e4
Fix reference issues during epoch end result collection (#8621)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-30 12:16:47 +00:00
Carlos Mocholí 93784da2c3
Fix pre-commit blacken-docs failures (#8624) 2021-07-30 12:10:15 +00:00
Adrian Wälchli 1bc052c290
Remove dead code in eval loop output tracking (#8625) 2021-07-30 14:04:51 +02:00
Carlos Mocholí bb4887368c
Docs improvements around hparams (#8577)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-07-30 11:06:03 +00:00
Carlos Mocholí 9720e264f5
Fix references for `ResultCollection.extra` and improve `str` and `repr` (#8622) 2021-07-30 12:47:34 +02:00
Sean Naren 07b7dc9c17
[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) (#8627)
* Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded

* Add a small test

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/accelerators/accelerator.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Address review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-30 11:31:08 +01:00
Adrian Wälchli b6ea6373dd
exclude mpi run from auto-detection of horovod (#8610) 2021-07-30 12:01:00 +02:00
Carlos Mocholí c99e2fe0d2
Test `Callback.on_load_checkpoint` order (#8588) 2021-07-29 12:28:29 +02:00
Adrian Wälchli 7901d297d3
remove support for optimizer_idx in the training_step for manual optimization (#8576) 2021-07-29 08:30:45 +00:00
Kaushik B 9c80727b8c
Add ddp_cpu to DistributedType Enum (#8596) 2021-07-29 10:02:32 +02:00
Carlos Mocholí c2199fbbee
Fix `trainer.fit_loop.split_idx` reference (#8601)
* Fix split idx reference

* Update CHANGELOG

* Add comment
2021-07-29 08:00:04 +00:00
Carlos Mocholí 0dc0472e1f
Use class name in SWA info message (#8602) 2021-07-29 09:39:46 +02:00
Carlos Mocholí ebd2e87752
Delete deprecated `TrainerLoggingMixin` (#8609)
* Delete deprecated `TrainerLoggingMixin`
* Update CHANGELOG
* Delete from Trainer
2021-07-29 08:39:16 +02:00
Adrian Wälchli 8c27fa71fa
[1 / 3] improvements to saving and loading callback state (#6886)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-29 00:12:32 +02:00
Jirka Borovec 0c0b24c031
Prune deprecated metrics (#8586)
* drop metrics

* drop tests

* fix imports

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-28 16:57:31 +00:00
Carlos Mocholí 47c47faeae
Remove `outputs` in `on_train_epoch_end` hooks (#8587) 2021-07-28 18:27:54 +02:00
Jirka Borovec 470842f5c8
CI: validate JSON & fix benchmark (#8567)
* CI: validate JSON

* as GHA

* PT1.8

* 32g

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-28 18:09:15 +02:00