Commit Graph

803 Commits

Author SHA1 Message Date
Sean Naren c6b6888387
Add DeepSpeed Stage 1 + doc improvements for model parallel (#8974)
* Add stage 1 support + small doc improvements

* Add CHANGELOG.md
2021-08-18 19:40:19 +05:30
ananthsub 6992db524b
Update governance.rst (#8956) 2021-08-17 11:18:43 -07:00
Sean Naren 32c7cced54
Fix CheckpointIO doc annotations (#8931) 2021-08-16 10:02:05 +00:00
Swaroop 6dd3a6c564
A minor syntax correction (#8925)
Removed an extra quote - "
2021-08-16 02:25:26 +05:30
ananthsub 037a86c873
Remove write_predictions from LightningModule (#8850)
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
Sean Naren b2973a035e
Introduce CheckpointIO Plugin (#8743) 2021-08-13 17:35:31 +01:00
ananthsub fec4f283bc
Update DataModule docs following property deprecations (#8864) 2021-08-12 10:02:26 -07:00
ananthsub b47e3ab7ce
Remove truncated_bptt_steps from Trainer constructor (#8825)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-11 03:26:01 +00:00
Carlos Mocholí cb2a8ed1b8
Add `LightningCLI(run=False|True)` (#8751)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-10 15:01:36 +02:00
Adrian Wälchli e541803636
remove deprecation of gpu string parsing behavior (#8770) 2021-08-06 15:41:03 +00:00
Carlos Mocholí 4928dc5579
Improve SWA docs (#8717) 2021-08-05 16:07:50 +00:00
Sean Naren 49df107bdd
[docs] Update FSDP instructions and add DeepSpeed evaluate/predict example (#8713) 2021-08-04 15:21:30 +00:00
Sean Naren 98319f83bf
Reduce title length (#8709) 2021-08-03 23:17:10 +02:00
Sean Naren 49d03f87fe
[docs] Update deepspeed docs, add some more information and link to streamlit (#8691) 2021-08-03 16:12:36 +00:00
Sean Naren a1be6217ce
Expand the use cases, move them up for discoverability (#8692) 2021-08-03 11:47:20 +00:00
Jirka Borovec f67892ea96
CI: yesqa (#8564)
* add yesqa
* fix flake8

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-02 16:05:56 +00:00
Adrian Wälchli 16392a7de7
Update links for `zero_grad` to PyTorch docs (#8618) 2021-07-30 16:09:36 +02:00
Wei Ji a78709751a
Reverse width, height to height, width in docs (#8612)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-30 13:56:17 +00:00
Carlos Mocholí 93784da2c3
Fix pre-commit blacken-docs failures (#8624) 2021-07-30 12:10:15 +00:00
Carlos Mocholí bb4887368c
Docs improvements around hparams (#8577)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-07-30 11:06:03 +00:00
Carlos Mocholí 47c47faeae
Remove `outputs` in `on_train_epoch_end` hooks (#8587) 2021-07-28 18:27:54 +02:00
Jirka Borovec 0a71fe2859
CI: black docs (#8566)
* black docs

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-28 18:08:31 +02:00
Sean Naren aadd2a9d9c
Load ckpt path when model provided in validate/test/predict (#8352)
* Change trainer loading behaviour for validate/test/predict

* Fix

* Fix/add tests

* remove

* Cleanups

* Space

* cleanups

* Add CHANGELOG.md

* Move after setup

* Cleanups on logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remve

* fix test

* feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Feedback

* Same fix

* Same fix

* Add test for behaviour, modify based on feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Wording

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Cleanup docs

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* feedback

* Fixes to test API

* Add carlos description

* Move logic further

* Move checkpoint connector logic

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-28 10:12:46 +00:00
Kaushik B 39de7fefeb
Lightning Release v1.4 (#8579)
* Update Lightning version to v1.4

* update notebooks

* Update release date in Changelog

* docs

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-27 14:00:13 +00:00
edenlightning c7e5743d54
Update cloud docs (#8569)
* amp

* amp

* docs

* add guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* amp

* amp

* docs

* add guides

* speed guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete ds.txt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update conf.py

* Update docs.txt

* remove 16 bit

* remove finetune from speed guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* remove early stopping from speed guide

* remove early stopping from speed guide

* remove early stopping from speed guide

* fix label

* fix sync

* reviews

* Update trainer.rst

* Update trainer.rst

* Update speed.rst

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* managing data

* managing data

* amp

* amp

* docs

* sync

* sync

* amp

* amp

* add data guide

* from review

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* from review

* from review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add data guide

* add data guide

* add data guide

* sync issues

* from reviw

* Update docs/source/guides/data.rst

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* add info if import fails

* fix cross referencing

* Add Datamodule motivation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* grid docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cloud_training.rst

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-27 16:22:52 +03:00
Aslı Sabancı 4605e8a4a5
Add missing highlighting for Python snippets (#8411)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-27 13:22:05 +02:00
Max d90cb7fceb
Bugfix: Scheduler monitor for manual optimization (#7643)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-27 16:04:14 +05:30
Adrian Wälchli eaa16c7480
docs: explain how Lightning uses closures for automatic optimization (#8551) 2021-07-26 15:40:16 +00:00
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Adrian Wälchli c519fce6fe
docs: clarify closure usage in gan example (#8521)
* clarify closure usage in gan example

* Update docs/source/common/optimizers.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove empty line

* Update docs/source/common/optimizers.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* do not capitalize if not a sentence

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-26 09:51:21 +02:00
Kaushik B ef7d41692c
Add `ddp_*_find_unused_parameters_false` to Plugins Registry. (#8483)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-24 04:02:54 +00:00
William Falcon 54cb009dd3
Update governance.rst 2021-07-23 07:41:18 -04:00
edenlightning 20fc8cf063
[Docs revamp 2/N] New doc for managing data (#8034)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-07-22 01:42:08 +00:00
Caleb Robinson 0ac9e2ae60
Adding appends to some of the pseudocode blocks (#8427) 2021-07-20 10:09:31 +02:00
Ethan Harris 7c07452615
Add LSFEnvironment to API reference (#8423) 2021-07-15 17:40:22 +03:00
Jamie 7f19930fe5
Tidy up IPU documentation (#8401) 2021-07-14 20:12:42 +05:30
Kaushik B b069493b15
Add troubleshooting section for tpus (#8277)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-12 13:29:02 +00:00
Andrew Tritt 3102922647
Add LSF support (#5102)
* add ClusterEnvironment for LSF systems

* update init file

* add available cluster environments

* clean up LSFEnvironment

* add ddp_hpc as a distributed backend

* clean up SLURMEnvironment

* remove extra blank line

* init device for DDPHPCAccelerator

We need to do this so we don't send the model to the same device from multiple ranks

* committing current state

* add additional methods to ClusterEnvironments

* add NVIDIA mixin for setting up CUDA envars

* remove troubleshooting prints

* cleanup SLURMEnvironment

* fix docstring

* cleanup TorchElasticEnvironment and add documentation

* PEP8 puts a cork in it

* add set_ranks_to_trainer

* remove unused import

* move to new location

* update LSF environment

* remove mixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changelog

* reset slurm env

* add tests

* add licence

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test node_rank

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lsf env to docs

* add auto detection for lsf environment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix is_using_lsf() and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-09 16:14:26 +02:00
Dusan Drevicky 1b06edf2f2
Add the `on_before_optimizer_step` hook (#8048)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-09 13:30:52 +02:00
Sean Naren 31fca1658d
[docs] Add NCCL environment variable docs (#8345)
* Add nccl env variable docs

* Wording

* Update docs/source/guides/speed.rst

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-09 11:27:18 +00:00
thomas chaton 1c825a2a9c
Add the `on_before_backward` hook (#7865)
* Add callback to hook tests and add predict test

* Fix lambda callback test

* Simplify lambda call test

* Use LambdaCallback

* Dynamically append to called for the model

* Remove print

* Consistency

* Consistency

* Prepare args/kwargs testing

* yapf doesn't like dict literals

* Add arguments for fit no val test

* Add arguments for fit no val test

* add before_backward_hook

* add test

* resolve flake8

* resolve tests

* update changelog

* add on_before_backward to LightningModule

* update on comments

* Test arguments

* Datamodule refactor

* Fix eval test

* remove extra file

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to hooks

* update

* resolve flake8

* update on comments

* Update full fit + val test

* Update test

* Remove FIXME

* Remove FIXME

* Undo change

* Fix

* Parametrize fit hook test

* Comment

* Parametrize fit hook test with different precision plugins

* Fix tests

* Parametrize fit hook test with manual optimization

* Unnecessary parenthesis

* WIP

* Comments

* Fix message

* Test CI error

* Revert "Test CI error"

This reverts commit 39c4a85a83.

* Add ddp training type teardown

* Update CHANGELOG

* Adrian's fix

* Use destructor

* Update CHANGELOG.md

* RPC destructor

* Update pytorch_lightning/plugins/training_type/ddp.py

* Why do you not work :(

* Missing condition

* Fix deepspeed test

* GC collect in conftest

* Do not show warnings for special tests

* Needs to run on 1.8

To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8"

* Run torch 1.8

* Skip test due to 'Python bus error'

* Debug NCCL

* shm size

* Disable warnings for special tests

* Remove NCCL_DEBUG statement

* Try smaller shm size

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* README and adjust versions

* Avoid self.on_gpu call

* empty cache cleanup

* More garbage collection

* Unroll parametrizations

* Do not reuse mock

* Undo changes

* Undo notebooks modification

* resolve test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete file

* Undo

* Fix test

* Revert "WIP"

This reverts commit f5828a8c42.

* Rename

* Remove optimizers

* Fix bug with LightningOptimizer

* Add optimizers

* update

* update

* Update CHANGELOG

* On after backward refactor

* Do not call super

* Fixes

* Remove should_accumulate

* pre/post backward refactor

* Call the LM backward hook

* Update tests

* Remove dev debug patch

* Fix test

* Remove optimizer arguments and typing

* Docs fixes

* Fix comment

* Undo changes

* Split manual and auto

* Undo change

* Deepsource

* Remove optimizers

* Undo changes

* Call the hook

* Docs

* Docs

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-09 06:15:57 +00:00
Mauricio Villegas 7d3452a000
LightningCLI documentation improvements (#8303)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-08 12:35:26 +05:30
Ethan Harris 56697dd894
Add logo_light.svg (#8327) 2021-07-07 17:24:31 +00:00
Sean Naren 01f594baf4
Add quick docs for deepspeed infinity (#8323) 2021-07-07 15:58:27 +02:00
Sean Naren fc12fe721f
Remove RC candidate install (#8322) 2021-07-07 12:21:12 +00:00
Sidhant Sundrani 20df24d2a2
Enables reload of dataloaders on every n epochs from every epoch (#5043)
* edit arg to reload_dataloaders_every_n_epoch

* init reload_dataloaders_every_n_epoch

* edit logic to reload dl

* update arg to test datamodule

* update arg test dataloader

* edit reload dl logic in eval loop

* fix var name in reset_train_val_dataloaders

* fix error, use current_epoch attribute

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* assert reload_dataloaders_every_n_epochs positive

* assert reload_dataloaders_every_n_epochs positive

* add trainer property should reload dl

* update should reload dl in train loop

* condition on should reload dl in eval loop

* pep8

* fix update should reload dl in train loop

* add test case

* replace assertion with misconfig exception

* remove unused variable

* remove unnecessary checks

* replace to BoringModel

* remove unrequired comment

* deprecate _every_epoch

* add deprecated argument to trainer

* test case for deprecated arg

* remove unrequired assertion in train loop

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* modify misconfig exception for int

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* conv bool to int of depreciated _every_epoch

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update description of deprecated param

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update deprecation warning

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* modify argument to int only

* fix deprecated test function name

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* merge tests for reload dls

* add propery should reload dl

* removed and added to trainer property

* use property in train loop

* remove deprecated test

* add deprecated test to new file

* test case for exception

* update test datamodule every_n_epochs

* update trainer docs

* update hooks with every_n_epochs

* edit format if statement

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* typo in exception

* pytest check only misconfig exception

* remove unnecessary code in test

* remove unnecessary code in deprec test

* added match in test

* typo in comment

* revert to prev, keep only req in context manager

* Apply suggestions from code review

* docs

* rebase

* Apply suggestions from code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import: model_helpers instead of model_utils

* fix, add reload_dataloaders_every_n_epochs argument to data connector

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add required imports

* move deprecated log

* add missing import rank_zero_warn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update varname in should_reload_dl_epoch

suggestion from code review

* Fix CHANGELOG. Update deprecation versions

* Minor change

* change property name, mark protected

* update property name

* update property name

* Remove deprecated *_loop.py files

* Rename test func

* Update CHANGELOG.md

* use rank_zero_deprecation

* update deprecation message in trainer api docs

* test deprecation with real arg name in message

* fix typo in trainer docs

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-07 13:10:08 +02:00
William Falcon e148a1339a
Update governance.rst 2021-07-07 11:39:19 +02:00
Kaushik B 365a9bae33
Update Torch Elastic documentation (#8248)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-02 03:47:58 +05:30
Guillaume Tauzin baa7de2d9e
Fix truncated_bptt_steps hiddens detach() and improve docs (#8145)
* Fix truncated_bptt_steps hiddens detach()
* Improve truncated_bptt_docs
* Add missing import
* Improve documentation wordings
* pep8
* detach typo
* Update test
* Implement comments
* parametrize test
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

* Remove import

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-01 22:16:14 +01:00