Commit Graph

5243 Commits

Author SHA1 Message Date
Carlos Mocholí c5a120ed9d
Update to Mypy>0.9 (#8386) 2021-07-13 08:23:36 +02:00
Carlos Mocholí 733cdbb9ad
`every_n_val_epochs` -> `every_n_epochs` (#8383) 2021-07-13 01:20:20 +02:00
Carlos Mocholí f3e828426a
Clean code formatting CI job (#8378) 2021-07-12 20:28:35 +02:00
Daniel Stancl 91d98c8345
Fix mypy in utilities.device_dtype_mixin (#8127) 2021-07-12 18:56:06 +02:00
Kaushik B 4f1e7be5ec
Remove Vulture (#8381) 2021-07-12 13:39:36 +00:00
Kaushik B b069493b15
Add troubleshooting section for tpus (#8277)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-12 13:29:02 +00:00
Kaushik B 652eae684b
Exclude `lightning.py` for vulture (#8379) 2021-07-12 12:29:16 +00:00
thomas chaton 370fa67004
[Refactor] Improve loops API 1/n (#8334)
* resolve issues

* update

* update

* update

* add more exceptions

* resolve bug

* update

* update

* update changelog

* resolve bug

* resolve comments

* update

* update

* update changelog

* update

* update

* remove space

* update

* re-order protected trainer attr

* move public method up

* add docs to state dict methods

* combine __load with load_state_dict

* rename shadowed variable

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move changelog entry to refactor section

* refactor loop_progress property for test helper function

* update trainer setter docstring

* Update CHANGELOG.md

* Update pytorch_lightning/loops/base.py

* remove trainer check

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-07-12 11:13:50 +00:00
Ethan Harris e5732a1158
Temporarily pin sphinx version (#8377) 2021-07-12 09:35:44 +00:00
Kaushik B 825c5dbe8c
Add support for (accelerator='cpu'|'gpu'|'tpu'|'ipu'|'auto') (#7808)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-07-09 15:28:54 +00:00
Tilman Krokotsch 09ff295177
Hyperparameters for datamodule (#3792)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Tilman Krokotsch <tilman.krokotsch@iav.de>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-07-09 15:10:00 +00:00
Andrew Tritt 3102922647
Add LSF support (#5102)
* add ClusterEnvironment for LSF systems

* update init file

* add available cluster environments

* clean up LSFEnvironment

* add ddp_hpc as a distributed backend

* clean up SLURMEnvironment

* remove extra blank line

* init device for DDPHPCAccelerator

We need to do this so we don't send the model to the same device from multiple ranks

* committing current state

* add additional methods to ClusterEnvironments

* add NVIDIA mixin for setting up CUDA envars

* remove troubleshooting prints

* cleanup SLURMEnvironment

* fix docstring

* cleanup TorchElasticEnvironment and add documentation

* PEP8 puts a cork in it

* add set_ranks_to_trainer

* remove unused import

* move to new location

* update LSF environment

* remove mixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changelog

* reset slurm env

* add tests

* add licence

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test node_rank

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lsf env to docs

* add auto detection for lsf environment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix is_using_lsf() and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-09 16:14:26 +02:00
Dusan Drevicky 1b06edf2f2
Add the `on_before_optimizer_step` hook (#8048)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-09 13:30:52 +02:00
Sean Naren 31fca1658d
[docs] Add NCCL environment variable docs (#8345)
* Add nccl env variable docs

* Wording

* Update docs/source/guides/speed.rst

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-09 11:27:18 +00:00
Carlos Mocholí 0dfc265e2f
Parametrize fit hook test with manual optimization (#8071) 2021-07-09 10:36:09 +00:00
thomas chaton 1c825a2a9c
Add the `on_before_backward` hook (#7865)
* Add callback to hook tests and add predict test

* Fix lambda callback test

* Simplify lambda call test

* Use LambdaCallback

* Dynamically append to called for the model

* Remove print

* Consistency

* Consistency

* Prepare args/kwargs testing

* yapf doesn't like dict literals

* Add arguments for fit no val test

* Add arguments for fit no val test

* add before_backward_hook

* add test

* resolve flake8

* resolve tests

* update changelog

* add on_before_backward to LightningModule

* update on comments

* Test arguments

* Datamodule refactor

* Fix eval test

* remove extra file

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to hooks

* update

* resolve flake8

* update on comments

* Update full fit + val test

* Update test

* Remove FIXME

* Remove FIXME

* Undo change

* Fix

* Parametrize fit hook test

* Comment

* Parametrize fit hook test with different precision plugins

* Fix tests

* Parametrize fit hook test with manual optimization

* Unnecessary parenthesis

* WIP

* Comments

* Fix message

* Test CI error

* Revert "Test CI error"

This reverts commit 39c4a85a83.

* Add ddp training type teardown

* Update CHANGELOG

* Adrian's fix

* Use destructor

* Update CHANGELOG.md

* RPC destructor

* Update pytorch_lightning/plugins/training_type/ddp.py

* Why do you not work :(

* Missing condition

* Fix deepspeed test

* GC collect in conftest

* Do not show warnings for special tests

* Needs to run on 1.8

To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8"

* Run torch 1.8

* Skip test due to 'Python bus error'

* Debug NCCL

* shm size

* Disable warnings for special tests

* Remove NCCL_DEBUG statement

* Try smaller shm size

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* README and adjust versions

* Avoid self.on_gpu call

* empty cache cleanup

* More garbage collection

* Unroll parametrizations

* Do not reuse mock

* Undo changes

* Undo notebooks modification

* resolve test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete file

* Undo

* Fix test

* Revert "WIP"

This reverts commit f5828a8c42.

* Rename

* Remove optimizers

* Fix bug with LightningOptimizer

* Add optimizers

* update

* update

* Update CHANGELOG

* On after backward refactor

* Do not call super

* Fixes

* Remove should_accumulate

* pre/post backward refactor

* Call the LM backward hook

* Update tests

* Remove dev debug patch

* Fix test

* Remove optimizer arguments and typing

* Docs fixes

* Fix comment

* Undo changes

* Split manual and auto

* Undo change

* Deepsource

* Remove optimizers

* Undo changes

* Call the hook

* Docs

* Docs

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-09 06:15:57 +00:00
Jirka Borovec 1ad1a89c09
Unpin `PyYAML<=5.4.1` (#8329) 2021-07-08 22:04:47 +02:00
Carlos Mocholí eb6d991218
Refactor plugins backward (#8328) 2021-07-08 16:02:09 +02:00
Carlos Mocholí e9d0fe867f
Unpin Pillow after the 8.3.1 release (#8324) 2021-07-08 12:36:02 +05:30
Mauricio Villegas 7d3452a000
LightningCLI documentation improvements (#8303)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-08 12:35:26 +05:30
Kaushik B 3c637355a8
Use accelerator instead of dist backend for missing horovod warning (#8319) 2021-07-07 18:00:34 -07:00
Daniel Stancl 667def8d89
Fix mypy in `utilities.parsing` (#8132) 2021-07-07 23:32:12 +00:00
Jaime Ferrando Huertas 9bbca402ff
Add auto_insert_metric_name to ModelCheckpoint docstring. (#8310)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-07 23:15:21 +00:00
Carlos Mocholí b6108f1477
Fix broadcast for Windows minimal (#8331) 2021-07-07 22:01:34 +00:00
thomas chaton 7956c6bd4b
[Feat] Add FastForwardSampler 2/n - Fault Tolerant Training (#8307)
* wip

* update

* resolve bug

* wip

* wip

* wip

* resolved tests

* update on comments

* update

* update

* Update pytorch_lightning/utilities/auto_restart.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* update on comments

* Update pytorch_lightning/utilities/auto_restart.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* resolve bug

* update

* move properties to top

* update docs for fast forward sampler

* move public attribute to top

* add missing super call

* update docs for state_dict

* fix merge conflict

* add missing super() call

* move property to top

* update on comments

* update

* resolve bug

* update

* update on comments

* activate coverage for CaptureIterableDataset

* update on comments

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-07 20:21:21 +00:00
Carlos Mocholí c4353ea702
Remove `dev_debugger.call_count` (#8317) 2021-07-07 19:59:59 +02:00
Carlos Mocholí 368ac1c622
[CLI] Drop `ArgumentParser` when pickling and save before spawning (#8017)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-07 17:56:13 +00:00
Carlos Mocholí 07d7c37a79
Remove magic monitor support for `ModelCheckpoint` (#8293) 2021-07-07 18:36:19 +01:00
Ethan Harris 56697dd894
Add logo_light.svg (#8327) 2021-07-07 17:24:31 +00:00
Carlos Mocholí 398eed508f
Fix `self.optimizers()` not returning a single `LightningOptimizer` (#8326) 2021-07-07 18:57:45 +02:00
Carlos Mocholí 0cd406d4f1
Delete `checkpoint_connector.has_trained` (#8292) 2021-07-07 17:47:35 +01:00
Sean Naren 01f594baf4
Add quick docs for deepspeed infinity (#8323) 2021-07-07 15:58:27 +02:00
Sean Naren fc12fe721f
Remove RC candidate install (#8322) 2021-07-07 12:21:12 +00:00
Carlos Mocholí 9877265887
Simplify logger connector access (#8318) 2021-07-07 14:13:30 +02:00
Adrian Wälchli d73c32ab51
move `torch.cuda.set_device()` to enable collective calls earlier in setup (#8312) 2021-07-07 13:15:41 +02:00
Sidhant Sundrani 20df24d2a2
Enables reload of dataloaders on every n epochs from every epoch (#5043)
* edit arg to reload_dataloaders_every_n_epoch

* init reload_dataloaders_every_n_epoch

* edit logic to reload dl

* update arg to test datamodule

* update arg test dataloader

* edit reload dl logic in eval loop

* fix var name in reset_train_val_dataloaders

* fix error, use current_epoch attribute

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* edit every_n_epoch to every_n_epochs

* assert reload_dataloaders_every_n_epochs positive

* assert reload_dataloaders_every_n_epochs positive

* add trainer property should reload dl

* update should reload dl in train loop

* condition on should reload dl in eval loop

* pep8

* fix update should reload dl in train loop

* add test case

* replace assertion with misconfig exception

* remove unused variable

* remove unnecessary checks

* replace to BoringModel

* remove unrequired comment

* deprecate _every_epoch

* add deprecated argument to trainer

* test case for deprecated arg

* remove unrequired assertion in train loop

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* modify misconfig exception for int

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* conv bool to int of depreciated _every_epoch

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update description of deprecated param

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update deprecation warning

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* modify argument to int only

* fix deprecated test function name

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* merge tests for reload dls

* add propery should reload dl

* removed and added to trainer property

* use property in train loop

* remove deprecated test

* add deprecated test to new file

* test case for exception

* update test datamodule every_n_epochs

* update trainer docs

* update hooks with every_n_epochs

* edit format if statement

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* typo in exception

* pytest check only misconfig exception

* remove unnecessary code in test

* remove unnecessary code in deprec test

* added match in test

* typo in comment

* revert to prev, keep only req in context manager

* Apply suggestions from code review

* docs

* rebase

* Apply suggestions from code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import: model_helpers instead of model_utils

* fix, add reload_dataloaders_every_n_epochs argument to data connector

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add required imports

* move deprecated log

* add missing import rank_zero_warn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update varname in should_reload_dl_epoch

suggestion from code review

* Fix CHANGELOG. Update deprecation versions

* Minor change

* change property name, mark protected

* update property name

* update property name

* Remove deprecated *_loop.py files

* Rename test func

* Update CHANGELOG.md

* use rank_zero_deprecation

* update deprecation message in trainer api docs

* test deprecation with real arg name in message

* fix typo in trainer docs

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-07 13:10:08 +02:00
William Falcon e148a1339a
Update governance.rst 2021-07-07 11:39:19 +02:00
Kaushik B 2b6edae205
Decouple device parsing logic from Acc connector to Trainer (#8180) 2021-07-07 15:05:26 +05:30
thomas chaton bca5adf6de
Update contributing templates (#8256)
* update

* add a link

* update

* remove star

* update

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update .github/PULL_REQUEST_TEMPLATE.md

* update

* update

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-07-06 17:34:31 +00:00
gaoteng-git f796cb435f
Fix `PyTorchProfiler` prefix typo (#8308) 2021-07-06 17:12:07 +02:00
Carlos Mocholí 4184d7e738
Refactor GPU examples tests (#8294) 2021-07-06 13:14:04 +01:00
Stephen McGroarty f6a5bb2eee
Bump IPU version (#8290) 2021-07-06 10:27:34 +00:00
Adrian Wälchli 1e1d1821d0
fix best score on wrong device in EarlyStopping callback (#8295) 2021-07-06 10:59:33 +02:00
Carlos Mocholí 8fead58273
Add `functools.wraps` support for `is_overridden` (#8296) 2021-07-06 10:40:54 +02:00
Daniel Stancl 34efadd5b8
Fix mypy in `utilities.device_parser` (#8136)
* Fix mypy for utilities.device_parser

* Fix remaining mypy issues + disable ignoring mypy errors

* Return one Optional type annotation back

* Fix annotation for the parse_tpu_cores method

* Remove unused import

* Include carmocca's suggestion and fix mypy issue

* include carmocca's suggestion

* add `else` statement to `parse_gpu_ids` to inform mypy `gpus` is a type of `List[int]`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-06 10:39:57 +02:00
Adrian Wälchli f1341a555e
Remove deprecated optimizer argument from `manual_backward` (#8287)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-06 08:18:08 +00:00
Adrian Wälchli 9eda520bee
clean up unused attributes in LightningModule (#8259) 2021-07-06 10:13:09 +02:00
gaoteng-git a7e21bd5ad
move profiler.step from training_step_and_backward to optimizer_step_… (#8224)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-06 13:22:10 +05:30
Carlos Mocholí 7ddcdb26d8
Deprecate `trainer.disable_validation` (#8291) 2021-07-05 16:52:49 +02:00
Carlos Mocholí 441e16f61c
Default `EarlyStopping.check_on_train_epoch_end=True` (#8286) 2021-07-05 15:45:23 +02:00