Commit Graph

974 Commits

Author SHA1 Message Date
Aslı Sabancı 4605e8a4a5
Add missing highlighting for Python snippets (#8411)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-27 13:22:05 +02:00
Max d90cb7fceb
Bugfix: Scheduler monitor for manual optimization (#7643)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-27 16:04:14 +05:30
Adrian Wälchli eaa16c7480
docs: explain how Lightning uses closures for automatic optimization (#8551) 2021-07-26 15:40:16 +00:00
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Adrian Wälchli c519fce6fe
docs: clarify closure usage in gan example (#8521)
* clarify closure usage in gan example

* Update docs/source/common/optimizers.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove empty line

* Update docs/source/common/optimizers.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* do not capitalize if not a sentence

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-26 09:51:21 +02:00
Kaushik B ef7d41692c
Add `ddp_*_find_unused_parameters_false` to Plugins Registry. (#8483)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-24 04:02:54 +00:00
William Falcon 54cb009dd3
Update governance.rst 2021-07-23 07:41:18 -04:00
edenlightning 20fc8cf063
[Docs revamp 2/N] New doc for managing data (#8034)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-07-22 01:42:08 +00:00
Caleb Robinson 0ac9e2ae60
Adding appends to some of the pseudocode blocks (#8427) 2021-07-20 10:09:31 +02:00
Ethan Harris 7c07452615
Add LSFEnvironment to API reference (#8423) 2021-07-15 17:40:22 +03:00
Jamie 7f19930fe5
Tidy up IPU documentation (#8401) 2021-07-14 20:12:42 +05:30
Kaushik B b069493b15
Add troubleshooting section for tpus (#8277)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-12 13:29:02 +00:00
Andrew Tritt 3102922647
Add LSF support (#5102)
* add ClusterEnvironment for LSF systems

* update init file

* add available cluster environments

* clean up LSFEnvironment

* add ddp_hpc as a distributed backend

* clean up SLURMEnvironment

* remove extra blank line

* init device for DDPHPCAccelerator

We need to do this so we don't send the model to the same device from multiple ranks

* committing current state

* add additional methods to ClusterEnvironments

* add NVIDIA mixin for setting up CUDA envars

* remove troubleshooting prints

* cleanup SLURMEnvironment

* fix docstring

* cleanup TorchElasticEnvironment and add documentation

* PEP8 puts a cork in it

* add set_ranks_to_trainer

* remove unused import

* move to new location

* update LSF environment

* remove mixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changelog

* reset slurm env

* add tests

* add licence

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test node_rank

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lsf env to docs

* add auto detection for lsf environment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix is_using_lsf() and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-09 16:14:26 +02:00
Dusan Drevicky 1b06edf2f2
Add the `on_before_optimizer_step` hook (#8048)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-09 13:30:52 +02:00
Sean Naren 31fca1658d
[docs] Add NCCL environment variable docs (#8345)
* Add nccl env variable docs

* Wording

* Update docs/source/guides/speed.rst

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-09 11:27:18 +00:00
thomas chaton 1c825a2a9c
Add the `on_before_backward` hook (#7865)
* Add callback to hook tests and add predict test

* Fix lambda callback test

* Simplify lambda call test

* Use LambdaCallback

* Dynamically append to called for the model

* Remove print

* Consistency

* Consistency

* Prepare args/kwargs testing

* yapf doesn't like dict literals

* Add arguments for fit no val test

* Add arguments for fit no val test

* add before_backward_hook

* add test

* resolve flake8

* resolve tests

* update changelog

* add on_before_backward to LightningModule

* update on comments

* Test arguments

* Datamodule refactor

* Fix eval test

* remove extra file

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to hooks

* update

* resolve flake8

* update on comments

* Update full fit + val test

* Update test

* Remove FIXME

* Remove FIXME

* Undo change

* Fix

* Parametrize fit hook test

* Comment

* Parametrize fit hook test with different precision plugins

* Fix tests

* Parametrize fit hook test with manual optimization

* Unnecessary parenthesis

* WIP

* Comments

* Fix message

* Test CI error

* Revert "Test CI error"

This reverts commit 39c4a85a83.

* Add ddp training type teardown

* Update CHANGELOG

* Adrian's fix

* Use destructor

* Update CHANGELOG.md

* RPC destructor

* Update pytorch_lightning/plugins/training_type/ddp.py

* Why do you not work :(

* Missing condition

* Fix deepspeed test

* GC collect in conftest

* Do not show warnings for special tests

* Needs to run on 1.8

To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8"

* Run torch 1.8

* Skip test due to 'Python bus error'

* Debug NCCL

* shm size

* Disable warnings for special tests

* Remove NCCL_DEBUG statement

* Try smaller shm size

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* README and adjust versions

* Avoid self.on_gpu call

* empty cache cleanup

* More garbage collection

* Unroll parametrizations

* Do not reuse mock

* Undo changes

* Undo notebooks modification

* resolve test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete file

* Undo

* Fix test

* Revert "WIP"

This reverts commit f5828a8c42.

* Rename

* Remove optimizers

* Fix bug with LightningOptimizer

* Add optimizers

* update

* update

* Update CHANGELOG

* On after backward refactor

* Do not call super

* Fixes

* Remove should_accumulate

* pre/post backward refactor

* Call the LM backward hook

* Update tests

* Remove dev debug patch

* Fix test

* Remove optimizer arguments and typing

* Docs fixes

* Fix comment

* Undo changes

* Split manual and auto

* Undo change

* Deepsource

* Remove optimizers

* Undo changes

* Call the hook

* Docs

* Docs

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-09 06:15:57 +00:00
Mauricio Villegas 7d3452a000
LightningCLI documentation improvements (#8303)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-08 12:35:26 +05:30
Ethan Harris 56697dd894
Add logo_light.svg (#8327) 2021-07-07 17:24:31 +00:00
Sean Naren 01f594baf4
Add quick docs for deepspeed infinity (#8323) 2021-07-07 15:58:27 +02:00
Sean Naren fc12fe721f
Remove RC candidate install (#8322) 2021-07-07 12:21:12 +00:00
Sidhant Sundrani 20df24d2a2
Enables reload of dataloaders on every n epochs from every epoch (#5043)
* edit arg to reload_dataloaders_every_n_epoch

* init reload_dataloaders_every_n_epoch

* edit logic to reload dl

* update arg to test datamodule

* update arg test dataloader

* edit reload dl logic in eval loop

* fix var name in reset_train_val_dataloaders

* fix error, use current_epoch attribute

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* assert reload_dataloaders_every_n_epochs positive

* assert reload_dataloaders_every_n_epochs positive

* add trainer property should reload dl

* update should reload dl in train loop

* condition on should reload dl in eval loop

* pep8

* fix update should reload dl in train loop

* add test case

* replace assertion with misconfig exception

* remove unused variable

* remove unnecessary checks

* replace to BoringModel

* remove unrequired comment

* deprecate _every_epoch

* add deprecated argument to trainer

* test case for deprecated arg

* remove unrequired assertion in train loop

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* modify misconfig exception for int

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* conv bool to int of depreciated _every_epoch

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update description of deprecated param

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update deprecation warning

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* modify argument to int only

* fix deprecated test function name

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* merge tests for reload dls

* add propery should reload dl

* removed and added to trainer property

* use property in train loop

* remove deprecated test

* add deprecated test to new file

* test case for exception

* update test datamodule every_n_epochs

* update trainer docs

* update hooks with every_n_epochs

* edit format if statement

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* typo in exception

* pytest check only misconfig exception

* remove unnecessary code in test

* remove unnecessary code in deprec test

* added match in test

* typo in comment

* revert to prev, keep only req in context manager

* Apply suggestions from code review

* docs

* rebase

* Apply suggestions from code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import: model_helpers instead of model_utils

* fix, add reload_dataloaders_every_n_epochs argument to data connector

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add required imports

* move deprecated log

* add missing import rank_zero_warn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update varname in should_reload_dl_epoch

suggestion from code review

* Fix CHANGELOG. Update deprecation versions

* Minor change

* change property name, mark protected

* update property name

* update property name

* Remove deprecated *_loop.py files

* Rename test func

* Update CHANGELOG.md

* use rank_zero_deprecation

* update deprecation message in trainer api docs

* test deprecation with real arg name in message

* fix typo in trainer docs

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-07 13:10:08 +02:00
William Falcon e148a1339a
Update governance.rst 2021-07-07 11:39:19 +02:00
Kaushik B 365a9bae33
Update Torch Elastic documentation (#8248)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-02 03:47:58 +05:30
Guillaume Tauzin baa7de2d9e
Fix truncated_bptt_steps hiddens detach() and improve docs (#8145)
* Fix truncated_bptt_steps hiddens detach()
* Improve truncated_bptt_docs
* Add missing import
* Improve documentation wordings
* pep8
* detach typo
* Update test
* Implement comments
* parametrize test
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

* Remove import

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-01 22:16:14 +01:00
Ethan Harris c0caeb3ea9
Update docs for new template (#8232)
* Update docs for new template

* Fixes

* Fixes

* Drop links
2021-07-01 16:19:09 +01:00
Mauricio Villegas 3c74502919
Add support for optimizers and learning rate schedulers to LightningCLI (#8093)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:04:11 +02:00
SATISH J 4af8eff0a1
fix: training_step_end doesn't work as stated in docs (#8188) 2021-06-30 00:24:06 +00:00
thomas chaton 24db914093
Support state restoration of logged results 2/2(#7966)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-25 19:16:11 +00:00
edenlightning d4d5418cc4
Fix notebook links (#8089)
* Fix notebook links

* update

* BERT

* docs

* Update README.md

* Apply suggestions from code review

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-23 21:36:31 +00:00
Carlos Mocholí 4d9b72b8a9
Nuke RPC (#8101) 2021-06-23 18:31:13 +00:00
Edgar Riba b378806b6c
Add `add_to_queue`/`get_from_queue` for DDP spawn(#7916)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-23 03:19:37 +02:00
edenlightning 599d6db10f
Fix Grid run commands (#8021)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-06-18 01:24:15 +00:00
Jirka Borovec 7978a5376d
Ipynb update (#8004)
* git submodule update --remote

* update notebooks in docs

* prune

* _notebooks

* docs

* path

* path

* ignore

* head
2021-06-17 16:46:05 +02:00
edenlightning 5647087f03
New speed documentation (#7665)
* amp

* amp

* docs

* add guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* amp

* amp

* docs

* add guides

* speed guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete ds.txt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update conf.py

* Update docs.txt

* remove 16 bit

* remove finetune from speed guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* remove early stopping from speed guide

* remove early stopping from speed guide

* remove early stopping from speed guide

* fix label

* fix sync

* reviews

* Update trainer.rst

* Update trainer.rst

* Update speed.rst

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-06-16 21:28:51 +00:00
thomas chaton 917cf83638
[doc] Add more reference around predict_step (#7997)
* add predict examples

* update on comments
2021-06-16 12:23:27 +01:00
Mauricio Villegas 0004216f2f
Easier configurability of callbacks that should always be present in LightningCLI (#7964)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-16 02:03:37 +02:00
Carlos Mocholí 560b1970af
Standardize positional datamodule and argument names (#7431)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-15 11:50:13 +00:00
Sean Naren 0974d66c6c
Add docs for IPUs (#7923)
* Added base docs for IPUs

* Fix

* Add details around poptorch profiler and model parallelism

* more description

* Add image

* Clearer messaging

* Cleanup

* Better name

* Add note

* Add some details around device iterations and model parallelism

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add a small install comment

* Add clip gradients not supported

* Update docs/source/advanced/ipu.rst

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Add note

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-15 10:16:47 +00:00
Eugene Huang 898fb56b16
added on_test_start() documentation (#7962)
Co-authored-by: ehuang68 <>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-14 14:19:48 +00:00
Adrian Wälchli 20a5e09e33
fix myst-parser warning blocking docs ci (#7967) 2021-06-14 11:17:53 +00:00
Mauricio Villegas cdd01f32da
LightningCLI support for argument links applied on instantiation (#7895)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-12 13:13:14 +02:00
Carlos Mocholí ec4f8856af
Enable logger connector re-design (#7891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-09 14:24:45 +00:00
Jirka Borovec 0fda862274
Refactor notebooks (#7752)
* drop notebooks

* add submodule

* copy notebooks

* docs include ipynb

* fix headers

* CI

* readthedocs

* manifest

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* req

* workdir

* pandoc

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pandoc

* manifest

* Apply suggestions from code review

* fix versions

* checkout

* `git submodule update --init --recursive --remote`

* notebooks @docs

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-06-08 16:30:13 +00:00
thomas chaton ea71cf4a5f
[Test] Add extra test for val_check_interval in distributed scenario (#7863)
* add extra test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add computation

* Update docs/source/common/trainer.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update docs/source/common/trainer.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* use tmpdir

* update on comments

* update

* Update tests/callbacks/test_progress_bar.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-07 10:37:32 +00:00
Adrian Wälchli acd38dd406
update docs example with sharded eval step (#7748)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-06-07 09:49:41 +01:00
Guillaume Tauzin 1da1898d41
[docs] Fix truncated_bptt_steps docs (#7846) 2021-06-06 18:31:14 +00:00
thomas chaton 51d370f4c2
[doc] Move each profiler to its own file + Add missing PyTorchProfiler to the doc (#7822) 2021-06-04 21:08:29 +05:30
Sean Naren 0a72fd2284
Add FSDP docs (#7791)
* Add FSDP docs

* Address reviews

* Add note about how FSDP can replace pipe parallelism

* Add import

* Remove sentence
2021-06-02 09:52:48 +00:00
Kaushik B e4ba06c70f
Replace deprecated distributed_backend by acc in examples (#7795)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-06-02 07:43:24 +02:00