Commit Graph

5124 Commits

Author SHA1 Message Date
Adrian Wälchli 55a90af7fc
`pytorch_lightning.loops` file structure: group by dataloader, epoch, and batch loop (#8077) 2021-06-24 23:40:46 +02:00
Carlos Mocholí 2c43bfc5ef
GPU CI - run torch 1.8 (LTS) (#8116) 2021-06-24 16:56:43 +00:00
edenlightning d4d5418cc4
Fix notebook links (#8089)
* Fix notebook links

* update

* BERT

* docs

* Update README.md

* Apply suggestions from code review

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-23 21:36:31 +00:00
Carlos Mocholí 4d9b72b8a9
Nuke RPC (#8101) 2021-06-23 18:31:13 +00:00
Sean Naren 8bd7b1bdd7
Add torchelastic check when sanitizing GPUs (#8095)
* Add torchelastic check

* Add changelog

* Address review

* fix
2021-06-23 14:09:53 +02:00
Adrian Wälchli 4dc08e4035
Loop Refactor 6/N - Remove Old Predict Loop (#8094) 2021-06-23 14:05:06 +02:00
Adrian Wälchli fe48203111
restrict public interface of training loop (#8024)
* active optimizers

* check checkpoint callback

* epoch loop properties

* epoch loop methods

* training_batch_loop

* changelog

* update chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

* yapf

* backward

* fix missing string reference

* is_last_batch remains public

* remove dead code

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-23 10:25:29 +00:00
Adrian Wälchli a45ab00b30
Loop Refactor 5/N - Prediction Loop (#7700)
* integrate d180bb2

* Minor changes

* Refactor loop logic into logger connector

* Refactor test

* Tighter fx validator

* Add back split idx

* Typing

* update

* Conflict

* Fix tests

* resolve grad_norm

* update

* move to train loop

* Bye grad_norm_dict parameter

* Fix sync test

* update

* Fix bug when validation is run mid epoch

* fix grad_norm_dict test

* Fix fx_validator test

* fix grad_norm_dict test

* Fix order bug

* Detach tensors in test

* resolve some tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pdb

* resolve flake8

* Update test

* more tests

* Revert last thomas' changes

* resolve 1 test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor context restoration

* integrate latest changes from logger connector refactor poc

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* integrate latest changes from logger connector refactor poc

* Minor changes

* update changelog

* Remove unused argument

* Update CHANGELOG

* Copy call_hook changes

* Docs

* Fix ref

* move to cpu

* Bad merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pdb

* remove pdb

* Refactor to

* Avoid partial

* trigger ci

* Bad merge

* integrate latest logger connector changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove grad norm dicts list

* Diff

* properties first

* Bad merge

* Reuse metrics_to_scalars

* Use active loop

* Move to device

* resolve test

* integrate latest changes from logger connector poc

* define union

* define union

* Update logger connector

* Update result

* Update imports

* Update after rename

* Refactor reduce_fx and op

* Fix test after rename

* mypy

* integrate latest logger connector refactor poc changes

* Fix test

* Refactor test

* Deprecate `self.log(sync_dist_op)` in favor of `self.log(reduce_fx)`

* Undo field

* add redundant return

* rename

rename files and classes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename

* Replace code

* Fix names and imports

* Remove metric_attribute

* imports

* loop hygiene

* yapf on loops

* protected new loop trigger

* rename NEW LOOP guard

* integrate latest logger connector changes

* integrate latest logger connector changes (eval loop)

* resolve todo dataloading reset

* re-add notebooks

* add missing init

* bad merge

* remove NEW_LOOP guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* exclude coverage


coverage

* integrate #7917, remove teardown from training loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update "accumulated_batches_reached" condition

 based on if iter count was updated  or not

* remove public loop properties

* make skip backward protected again

* typing base loop

* typing fit loop

* typing training_batch_loop

* typing evaluation loop

* typing prediction loop

* typing training epoch loop

* dataloader_loop

* evaluation_dataloader_loop

* prediction_dataloader_loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* integrate train loop changes from master

* integrate eval loop changes from master

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tpipes moving model to cpu and leaving it there.

* don't reset fit loop


don't reset fit loop

* fix test iteration count <-> batch_idx reset

* replace torch.Tensor -> Tensor

* fix attribute error to block_ddp_sync_behaviour

* fix flake8 and yapf conflict

* remove redundant override

* add classes

Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* trainer changes

* connect

* clean up

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update test renaming

* rename evaluation loop to evaluation epoch loop

* minor docstring improvements

* update chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try ci fix

* update code owners for pl/loops

* update mock path

* re-order

* simplify dataloader reset

* simplify get_dataloaders()

* save predictions on_run_end()

* improve skip condition re-routing

* re-order

* remove unused type import

* check which assert is failing

* pig

* hobbit

* teardown for evaluation

* Revert "hobbit"

This reverts commit e81b0dbee3.

* Revert "pig"

This reverts commit 33d89e0720.

* Revert "check which assert is failing"

This reverts commit b7483b425c.

* free memory in fit loop teardown

* update docstring

* period

* remove dead code

* else carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/evaluation_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update chlog

* unused imp

* move default construction in run_evaluation

* add something for lawyer to read

* switch typehint for eval loop trainer property

* add missing imports

* remove a todo that needs more discussion

* combine _get_num_dataloaders with the property

* Update pytorch_lightning/loops/dataloader/dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* black + yapf

* avoid coverage on old unused eval loop

* empty space in docstring

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve todo for args forwarding

* weekproxy trainer

* fix check for num dataloaders kwargs

* clean up num prediction dataloaders property

* free memory

* rm notebooks folder

* rm old file

* revert changes to old eval loop

* bad merge

* undo teardown

* setup signature

* remove file for notes

* free memory

* chlog

* Revert "weekproxy trainer"

This reverts commit d4e6969170.

* connect trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up max batches and dataloaders

* max batches handling

* no grad handling

* unused argument

* protected attrs

* unused imports

* undo unintentional rename

* consistent naming

* capitalization in docstring

* list all args

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/prediction_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/prediction_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-06-23 10:17:04 +01:00
Kaushik B 58b47dabce
Add `log_device_info` to Trainer (#8079) 2021-06-23 09:34:56 +02:00
Carlos Mocholí 54ac4e03cb
Update fit with no validation hook test (#7738)
* Add callback to hook tests and add predict test

* Fix lambda callback test

* Simplify lambda call test

* Use LambdaCallback

* Dynamically append to called for the model

* Remove print

* Consistency

* Consistency

* Prepare args/kwargs testing

* yapf doesn't like dict literals

* Add arguments for fit no val test

* Add arguments for fit no val test

* Test arguments

* Datamodule refactor

* Fix eval test

* Update full fit + val test

* Update test

* Update resume test

* Remove changes

* Fix
2021-06-23 09:34:00 +02:00
nisheethlahoti 06f8349291
Support calling fit and test scripts using "python -m" module syntax with DDP (#8073)
Co-authored-by: Nisheeth Lahoti <nisheeth@rephrase.ai>
2021-06-23 02:42:04 +00:00
Edgar Riba b378806b6c
Add `add_to_queue`/`get_from_queue` for DDP spawn(#7916)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-23 03:19:37 +02:00
Carlos Mocholí 6dd7797c97
Deprecate moved warning functions (#8085) 2021-06-23 00:09:42 +02:00
Kaushik B 8bce39431e
Use XLA utility API to move data to CPU (Single TPU core) (#8078) 2021-06-22 23:39:23 +02:00
thomas chaton f79f0f9de1
[Refactor] Remove _run_evaluation + 3 EvaluationLoop (#8065)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-22 16:10:07 +02:00
Gabriele Picco 8cc646f05b
Specify packaging version to be more than 17.0 (#8030) 2021-06-22 12:15:22 +00:00
Théo Dumont 5d44e61efc
Add forgotten colon (#8076)
Very small typo correction: add forgotten `:` in finetuning callbacks docs.
2021-06-22 10:37:39 +00:00
Adrian Wälchli 9a64e534c7
Loop Refactor 4/N - Remove Old Evaluation Loop (#8056) 2021-06-22 11:57:37 +02:00
Adrian Wälchli 4becd1cf31
rename old `Trainer.train_loop` -> `Trainer.fit_loop` (#8025) 2021-06-22 11:49:32 +02:00
Adrian Wälchli 61e6e14ae2
update changelog after 1.3.7 (#8075) 2021-06-22 15:14:52 +05:30
pre-commit-ci[bot] 7828814810
[pre-commit.ci] pre-commit autoupdate (#8067)
updates:
- [github.com/PyCQA/isort: 5.8.0 → 5.9.1](https://github.com/PyCQA/isort/compare/5.8.0...5.9.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-06-21 22:50:12 +00:00
Carlos Mocholí f1fa4c4727
Update fit with val hook test (#8060) 2021-06-21 17:27:37 +00:00
Carlos Mocholí dd340a6598
Actually show deprecation warnings and their line level [2/2] (#8002)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-21 18:51:53 +02:00
Carlos Mocholí d9bf9759fb
Add `LightningCLI(save_config_overwrite=False|True)` (#8059) 2021-06-21 17:58:02 +02:00
simran2905 d1efae2e47
Fix checkpointed state for lr_schedulers with step interval (#7877)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-21 15:08:07 +00:00
Kaushik B 2303f9ced8
Fix(Early Stopping): move best score to device (#7959) 2021-06-21 15:41:41 +05:30
Wang Ran (汪然) 92a78d58c3
fix formatting typo in seed_everything docs (#8052) 2021-06-21 10:16:24 +02:00
Adrian Wälchli c7eaf76dbe
prevent memory test failing due to earlier test leaking memory (#8029) 2021-06-18 19:39:45 +01:00
thomas chaton 651a0fbdeb
[bugfix] Properly name PyTorchProfiler traces (#8009)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-18 18:16:36 +00:00
Carlos Mocholí e55f01e665
Update evaluation hook tests (#8013) 2021-06-18 16:41:27 +00:00
Adrian Wälchli 0d6dfd42d8
Merge pull request #7990 from PyTorchLightning/refactor/loops/loops_everywhere_eval
Loop Refactor 3/N - Evaluation Loop
2021-06-18 08:54:59 -04:00
Andrew Tritt e808f9fb28
Use DistributedSampler when running with custom accelerator (#7814)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-18 14:34:05 +02:00
Carlos Mocholí a23a69965e
Deprecate returning extras with grads (#7994)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-18 13:05:37 +01:00
Kaushik B f447839d16
Add `warning_cache.deprecation` and set warning stacklevel [1/2] (#8005)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-18 11:50:24 +00:00
edenlightning 599d6db10f
Fix Grid run commands (#8021)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-06-18 01:24:15 +00:00
Carlos Mocholí cdcc483e9b
CHANGELOG update after v1.3.6 release (#7988) 2021-06-17 15:59:40 +00:00
Jirka Borovec 7978a5376d
Ipynb update (#8004)
* git submodule update --remote

* update notebooks in docs

* prune

* _notebooks

* docs

* path

* path

* ignore

* head
2021-06-17 16:46:05 +02:00
David Chan c6e02e481e
[feat] Allow overriding optimizer_zero_grad and/or optimizer_step when using accumulate_grad_batches (#7980) 2021-06-17 12:50:37 +02:00
Adrian Wälchli eebdc910dd
progressive restoring of trainer state (#7652) 2021-06-17 08:13:53 +00:00
thomas chaton 3fece17ffb
[feat] Add `{,load_}state_dict` to `ResultCollection` 1/n (#7948)
* add metric reload

* add tests

* update changelog

* udpate

* remove print

* remove attribute_name

* update

* update

* resolve test

* update on comments

* bypass typing bug

* update on comments

* Update CHANGELOG

* Update tests

* Update code

* Check if TODO persists

* Remove unrelated changes

* Fixes

* Revert "Check if TODO persists"

This reverts commit 68dac4ae69.

* Do not serialize dataclasses

* Avoid recostructing meta twice

* Keep previous sync_fn

* Move to device and map_location

* Fix bug

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-17 08:08:22 +01:00
Austin Basye 906de2a7fa
[feat] Named Parameter Groups in `LearningRateMonitor` (#7987)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-17 03:13:54 +02:00
edenlightning 5647087f03
New speed documentation (#7665)
* amp

* amp

* docs

* add guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* amp

* amp

* docs

* add guides

* speed guides

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete ds.txt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update conf.py

* Update docs.txt

* remove 16 bit

* remove finetune from speed guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* speed

* remove early stopping from speed guide

* remove early stopping from speed guide

* remove early stopping from speed guide

* fix label

* fix sync

* reviews

* Update trainer.rst

* Update trainer.rst

* Update speed.rst

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-06-16 21:28:51 +00:00
Sean Naren 55494e8745
Fix Special Tests (#7841)
* Remove port setting

* Drop one of the params to see what happens

* Split tests into two

* Try using port setting
2021-06-16 19:39:03 +02:00
Carlos Mocholí bc2c2db2bf
Do not override the logged epoch in `logged_metrics` (#7982) 2021-06-16 13:36:58 +00:00
Carlos Mocholí 2134216546
Change `WarningCache` to subclass `set` (#7995) 2021-06-16 14:09:52 +01:00
Carlos Mocholí 4ffba600c9
Add predict hook test (#7973) 2021-06-16 15:09:24 +02:00
thomas chaton 917cf83638
[doc] Add more reference around predict_step (#7997)
* add predict examples

* update on comments
2021-06-16 12:23:27 +01:00
thomas chaton d2983c7c51
[fix] Enable manual optimization DeepSpeed (#7970)
* resolve manual optimization

* resolve manual optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

* Simplify message

* Move from deprecated

* Split model parallel/manual model

* Use property

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-06-16 09:25:41 +00:00
Adrian Wälchli b093a9e66d
Support `save_hyperparameters()` in LightningModule dataclass (#7992)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-16 10:30:58 +02:00
Adrian Wälchli 341adad819
Loop Refactor 2/N - Remove Old Training Loop (#7985)
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-16 09:00:33 +01:00