Commit Graph

48 Commits

Author SHA1 Message Date
Ivan Švogor 25b771ca08
Create the loss accumulator directly on the device (#12430)
Co-authored-by: Ivan Svogor <ivan.svogor@iarai.ac.at>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-25 12:46:17 +01:00
edward-io 87bd54aedf
fix typos (#11937) 2022-02-16 17:27:51 -08:00
Carlos Mocholí 963adc7857
Small cleanup when dataloader states are saved (#11843) 2022-02-16 20:57:21 +00:00
Rohit Gupta 59ef66c06b
Fix support for `CombinedLoader` while checking for warning raised with eval dataloaders (#10994) 2021-12-14 20:43:23 +05:30
Carlos Mocholí 0061619e0a
Improve typing for loops (#10780) 2021-11-30 20:28:55 +00:00
thomas chaton 3d6262b7a9
Fault Tolerant Manual: Add support for DDP (#10638) 2021-11-25 18:31:53 +01:00
thomas chaton b28ab34ff5
Fault Tolerant Manual: Add loading to reload the states (#10699)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 17:18:36 +00:00
thomas chaton 8d810d6144
Enable distributed training with CombinedDataLoader and max_size_cycle (#10374)
* solve combinedloader

* update

* update changelog

* update on comments

* resolve iterable dataset support

* update test description

* update

* update on comments

* update

* Accelerator auto

* Address review

* Refactor

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 20:06:10 +00:00
Gili Tzabari a967b6eba0
del iterator on_run_end() (#9915) 2021-10-29 16:29:44 +00:00
thomas chaton 5841ca9782
[Feat] Add auto_restart for fault tolerant training (#9722) 2021-10-01 16:37:17 +00:00
thomas chaton 9148a13de0
Enable DataLoader state restoration for the evaluation loop (#9563)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-09-24 16:21:00 +00:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Adrian Wälchli a5e2f2b432
fix state extraction from batch when fault-tolerant training (#9281) 2021-09-02 11:57:40 -07:00
Adrian Wälchli b13749b4ec
add fault-tolerance for global random state in map-style datasets (#8950)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-26 12:13:31 +00:00
Adrian Wälchli 38ceb8943e
add docs (#8952) 2021-08-18 10:33:42 +00:00
Adrian Wälchli 522df2b89b
3/n integrate new LightningDataFetcher into loop (#8953)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-08-17 21:42:22 +00:00
ananthsub 037a86c873
Remove write_predictions from LightningModule (#8850)
* Remove write_predictions from LightningModule
2021-08-14 02:00:23 +00:00
thomas chaton e060547230
[Bug] Add SharedCycleIteratorState (#8889) 2021-08-13 19:06:56 +01:00
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Carlos Mocholí f7027a8701
Remove `torch >= 1.6` checks (#8523) 2021-07-23 04:03:20 +00:00
thomas chaton 374fae59ef
[Feat] Add utilities for CombinedLoader state dict and dataloader state dict 1/n (#8364)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-19 09:56:57 +00:00
deepsource-autofix[bot] 03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
deepsource-autofix[bot] e11fe19673
Remove unnecessary use of comprehension (#8149)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 10:00:02 +01:00
Justus Schock 7b283e3c46
Bugfix/Multiple dataloaders (#7433)
* Update supporters.py

* Update apply_func.py

* Update supporters.py

* Update model_train_dataloaders.py

* Update model_train_steps.py

* Update test_dataloaders.py

* Update CHANGELOG.md

* Update model_train_steps.py

* Update test_dataloaders.py

* Update test_dataloaders.py

* Update supporters.py

* Update test_supporters.py

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Apply suggestions from code review

Co-authored-by: Edgar Riba <edgar.riba@gmail.com>

* Update supporters.py

* Update supporters.py

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Edgar Riba <edgar.riba@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-11 16:33:29 +02:00
Adrian Wälchli b9b3fa371f
fix case where an IterableDataset doesn't produce a batch for an epoch (#7294)
* wip

* fix

* add test

* refactor + test

* rm

* formatting

* update changelog

* doc

* docstring

* remove unused import

* Update CHANGELOG.md

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 12:45:55 +00:00
thomas chaton 9beec26c3e
[bugfix] Add support for CombinedLoader in validation with ddp (#7102)
* add test

* add changelog

* resolve flake8

* remove print
2021-04-20 08:22:02 +00:00
Carlos Mocholí f0c5479de9
Remove legacy `Result` parameters (#6016) 2021-03-28 11:55:08 +02:00
Jirka Borovec aba212341a
formatting 4/n: Trainer (#5720)
* yapf trainer

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* .

* fix

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-02-03 09:25:42 +00:00
Jirka Borovec c3587d39da
prune deprecated EvalResult (#5633)
* prune EvalResult

* drop tests

* drop usage

* drop class

* prune
2021-01-26 03:09:39 -05:00
Jirka Borovec 2846322f60
fix docs render (#5610) 2021-01-25 20:21:00 -05:00
Justus Schock ef7345dc4e
add possibility for nested loaders (#5404)
* add possibility for nested loaders

* pep8: newline
2021-01-24 07:32:02 -05:00
Arnaud Gelas a9d9f33a86
Fix isort failures in trainer (#5529)
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/trainer/*.py
2021-01-18 13:42:50 -05:00
Loi Ly 1d13943605 Fix reset TensorRunningAccum (#5106)
* Fix reset TensorRunningAccum

* add test for TensorRunningAccum's reset method

* fix CI failed due to PEP8

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-01-05 09:58:36 +01:00
Jirka Borovec c72880f109
hotfix: dataloaders - add unimplemented methods (#5352)
* add unimplemented methods

* test

* test

* flake8
2021-01-05 03:41:20 -05:00
Justus Schock d88cf4a652
Add Support for multiple train loaders (#1959)
* add support for wrong dtype in apply_func

* apply loader resetting to possible collection of loaders

* add combined loader iter class

* integrate combined loader iter to training loop

* fix imports

* fix imports

* finish supporters

* add tests for supporters

* add test for model with multiple loaders

* fix trainer integration

* fix instance check

* Train loaders (#4032)

* patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader

* update data_loading.py to it uses patch discussed in #1959

* rename class

* Separate CombinedLoaderIterator into two classes, and update related tests. (#4606)

* Fix the bugs after rebasing.

* Add custom get_len for apply_to_collection

* Refactor MultiIterator to be as CombinedLoaderIterator

* To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py

* Reload _loader_iters when calling __iter__

* Don't transform DataLoader to CombinedLoaderIterator when it's along

* Updates test_fit_multiple_train_loaders for testing num_training_batches

* Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format.

* Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders

* Update tests for supporters

* Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders.

* Fix pep8 issues

* Add tests for train_loader_patch.py

* Add descriptions to multiple_trainloader_mode

* Remove unused variables

* Add docstrings and typing

* Add more tests for better converage

* Remove unused commented codes

* Add sampler property

* Remove extract_dataset

* Update typing

* pep8

* Update train_loader_patch.py

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/supporters.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* reviewer comments

* fix stupid import

* add docs

* add back line separator

* fix line sep

* pep8

* Apply suggestions from code review

* fix

* fix

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* flake8

Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box>
Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com>
Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
Jirka Borovec 0f36525e8f
fix/enable - check F401 (#5201)
* refactor - check F401

* missed

* fix
2020-12-21 10:15:04 +01:00
Justus Schock ebbf256bf5
Create memory dynamically (#4938)
* create window size dynamically.

* pep8

Co-authored-by: chaton <thomas@grid.ai>
2020-12-02 01:05:12 +05:30
chaton 958aa1aee7
[test] Accumulated gradient optimization tests (#4477)
* adding tests

* wip

* update

* Update tests/trainer/test_trainer.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 23:44:11 +00:00
ananthsub d3f40d6a9e
Update to_disk to use fsspec for remote file support (#3930)
* Update supporters.py

* Update CHANGELOG.md

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-07 07:28:23 -04:00
William Falcon f43028f3ae
added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
Nathan Raw b9695237f1
Save test predictions on multiple GPUs (#2926)
* Save test predictions on multiple GPUs
2020-08-14 17:52:43 -04:00
William Falcon 6d10ac2ac8
Structured results (train loop only. val loop separate PR) (PR 2/5) (#2615)
* r

* r

* r

* patched optimizer closure with sr

* patched optimizer closure with sr

* patched optimizer closure with sr

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added autoreduce for train step

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added hooks

* added hooks

* added hooks

* added hooks

* added hooks

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* cache

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Update pytorch_lightning/core/step_result.py

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* simple

* finished tests for structured results on train epoch

* simple

* simple

* revert

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update tests/base/deterministic_model.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* finished tests for structured results on train epoch

* docstring typos

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update pytorch_lightning/core/step_result.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/overrides/data_parallel.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2020-07-20 19:00:20 -04:00
Udit Arora 08573d0f7e
Fix some pyright member access errors in training module (#2121)
* Fix pyright member access errors in training module

* Fix Trainer instantiation error due to inheritence order

* Add GH workflow for pyright

* Fix more pyright errors in trainer module

* Add pyrightconfig and setup python environment in type-check workflow

* Exclude pyrightconfig.json

* suggestions

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-06-12 17:23:18 +02:00
Alexey Karnachev 4c34d16a34
Fixed configure optimizer from dict without "scheduler" key (#1443)
* `configure_optimizer` from dict with only "optimizer" key. bug fixed

* autopep8

* pep8speaks suggested fixes

* CHANGELOG.md upd
2020-04-10 11:43:06 -04:00
Alexey Karnachev ddbf7de6dc
Added accumulation of loggers' metrics for the same steps (#1278)
* `add_argparse_args` method fixed (argument types added)

* autopep8 fixes

* --gpus=0 removed from test (for ci tests)

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Joe Davison <joe@huggingface.co>

* test_with_accumulate_grad_batches added

* agg_and_log_metrics logic added to the base logger class

* small format fix

* agg metrics strategies removed (not to complicate stuff)

* agg metrics: handle zero step

* autopep8

* changelog upd

* flake fix

* metrics aggregators factored out, metrics_agg.py added + tests

* metrics agg default value added

* Update pytorch_lightning/loggers/metrics_agg.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* metrics aggregators factored out, metrics_agg.py added + tests

* metrics agg default value added

* Update pytorch_lightning/loggers/metrics_agg.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* remove .item which causes sync issues (#1254)

* remove .item which causes sync issues

* fixed gradient acc sched

* fixed gradient acc sched

* test_metrics_agg.py removed (all tested in doctrings), agg metrics refactored

* test_metrics_agg.py removed (all tested in doctrings), agg metrics refactored

* autopep8

* loggers base.py types fixed

* test

* test

* metrics aggregation for loggers: each key now has a specific function (or default one)

* metrics aggregation for loggers: each key now has a specific function (or default one)

* docstrings upd

* manual typehints removed from docstrings

* batch_size decreased for test `test_with_accumulate_grad_batches`

* extend running accum

* refactor

* fix tests

* fix tests

* allowed_types generator scoped

* trainer.py distutils was imported twice, fixed

* TensorRunningAccum refactored

* TensorRunningAccum added to change log (Changed)

* change log pull link added

Co-authored-by: Joe Davison <joe@huggingface.co>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-08 08:35:47 -04:00
Paweł Rzepiński b8ff9bc1d2
Fix unimplemented type() on TPU (#1396)
* Fix unimplemented type() on TPU

* Add changelog entry

* Add quotation marks
2020-04-06 20:29:55 -04:00
Jirka Borovec 31017120fd
fix incomplete RunningMean (#1309)
* fix RunningMean

* changelog

* fix none

* Update supporters.py

just needed to multiply by zero for init

* Revert "Update supporters.py"

This reverts commit 7e0da6c6

* fix NaN

* formatting

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-03-30 18:28:31 -04:00