Commit Graph

32 Commits

Author SHA1 Message Date
Nicki Skafte 68fd3086f1
Prevent flickering progress bar (#6009)
* add padding

* fix

* fix

* Update pytorch_lightning/callbacks/progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* updated based on suggestion

* changelog

* add test

* fix pep8

* resolve test

* fix code format

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-02-17 19:01:51 +00:00
chaton e982800b81
Add PredictLoop (#5752)
* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* add predict_loop

* manual optimization

* clean predictloop

* update optimizer routing

* add predict loop on new accelerator

* resolve a bug

* add rank to torchelastic

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* fix memory mixed precision

* update

* setstate on trainer for pickling in ddp spawn

* add predict_loop

* clean predictloop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* clean predictloop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* resolve tests

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* remove sanetize

* rename train to run_train

* remove useless hooks

* add misconfigurationException

* remove wrong naming

* resolve some legacy

* udpate docstring

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* resolve flake8

* update code

* update

* Update pytorch_lightning/trainer/predict_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/trainer/predict_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix merge

* fix merge

* reset legacy accelerator

* add missing rename dispatch

* rename post traning

* update code

* resolved comments

* typo

* typo

* add flow description

* resolve comments

* update on comments

* update flow

* add backticks

* resolve tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 17:11:56 -05:00
Jirka Borovec 79d42d83e7
formatting 3/n: PL modules (#5716)
* cb

* log

* prof

* tune

* flake8
2021-02-08 14:28:38 -05:00
Adrian Wälchli 7b42494d0e Fix visual progress bar bug / properly reset progress bar (#4579)
* reset

* fix reset

* changelog

* update chlog

* typing

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-03 19:41:44 +01:00
chaton 3da28fd634
[feat] 1/2 Add trainer.predict (#5579)
* start adding predict

* add predict

* resolve test

* add predict

* remove limit_predict

* update

* add test for predict

* typo

* update on comments

* remove predict_step

* update ddp_shareded

* check ddp_sharded

* resolve on comments

* resolve isort

* update dp

* add test dp 1 gpu

* made default forward

* resolve path

* resolve bug

* update on comments

* resolve doc

* resolve bug

* update

* resolve bug

* update on comments

* resolve pep8

* update test doc

* update on comments

* solve special tests

* resolve bug

* resolve flake8

* Update pytorch_lightning/callbacks/progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add predict to LightningModule

* missing predict

* typo

* rename is_prediction to _predicting

* add

* update

* update

* update doc

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-27 11:38:14 -05:00
Jirka Borovec dee5553b2b
move to Pages dir (#4869)
* folders

* common / advanced / extensions

* paths

* flake8

* isort

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-26 15:07:07 -05:00
Rohit Gupta 9cfbf8d609 Disable checkpointing, earlystopping and logging with fast_dev_run (#5277)
* Disable checkpointing, earlystopping and logger with fast_dev_run

* docs

* chlog

* disable callbacks and enable DummyLogger

* add log

* use dummy logger method

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit f740245521)
2021-01-06 12:57:24 +01:00
Adrian Wälchli 89e8796e2a
fix incomplete progress bar when refresh_rate > num batches (#4577)
* fix progress bar overshoot

* fix updates for partially incomplete main  progress bar when val loop starts

* add tests

* chlog
2020-11-24 00:01:33 +01:00
William Falcon 4c0d063c86
outputs in __batch_end hooks (#3966)
* train_batch_end outputs

* added tests for the output hooks
2020-10-07 21:48:38 -04:00
William Falcon 65b6a6a497
0.10.0 (#3965) 2020-10-07 20:41:56 -04:00
Rohit Gupta a628d181ee
Fix val_progress_bar total with num_sanity_val_steps (#3751)
* Fix val_progress_bar total with num_sanity_val_steps

* chlog

* Fix val_progress_bar total with num_sanity_val_steps

* move test

* replaced with sanity flag and suggestions
2020-10-04 08:32:18 -04:00
Jeff Yang 951048a81e
docs: use ref for anchor links, fix a few typo (#3486) 2020-09-13 21:04:21 -04:00
William Falcon bda1400225
ref: restore on_eval_start hook (#3183)
* restore eval loop hook
2020-08-26 00:45:43 -04:00
William Falcon 2f6d82e0e6
ref: remove on_eval_start hook (#3176)
* remove on_eval_start hook

* remove on_eval_start hook
2020-08-25 22:28:00 -04:00
Rohit Gupta 7cca3859a7
Fix num_sanity_val_steps is clipped to limit_val_batches (#2917)
* Fix num_sanity_val_steps according to limit_val_steps

* fix test

* add num_sanity_batches

* pep

* update docstring in test

* add more test

* chlog

* update comments and docstring in test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
2020-08-21 20:11:31 +02:00
Ananya Harsh Jha 10150fccb0
make progress bar match internal epoch counter (#3061)
* fix 3018, 3032

* changed progress bar for 3032
2020-08-20 06:27:48 -04:00
William Falcon f43028f3ae
added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
William Falcon f82d7feb6c
updated hooks (#2850)
* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks
2020-08-07 09:29:57 -04:00
William Falcon b507c42c47
clarify batch hooks (#2842)
* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook
2020-08-05 20:01:30 -04:00
Rohit Gupta 84c507c4df
Fix max_batches with fast_dev_run. (#2581)
* Fix fast_dev_run to run for all val_dataloaders

* fast_dev_run check

* changelog

* explicit

* limit_batches with fast_dev_run in init

* add test

* whitespace and comment fix

* comment and assertion

* added tests

* Fix fast_dev_run to run for all val_dataloaders

* fast_dev_run check

* changelog

* explicit

* limit_batches with fast_dev_run in init

* add test

* whitespace and comment fix

* comment and assertion

* added tests

* added tests

* added tests

* added tests

* update rtol

* Revert "update rtol"

This reverts commit 4320329540.

* added tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-27 17:56:55 -04:00
Adrian Wälchli 1e68968ed7
support num_sanity_val_steps=-1 (#2246)
* support sanity_val_step=-1

* fix list size

* simplification

* simplify

* add test for num_sanity_val_steps=-1

* update test

* update docs

* extend tests to multiple dataloaders

* changelog

* Update tests/trainer/test_trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* improve test

* refactor the sanity check decision

* fix merge

* Update trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-23 07:07:03 -04:00
Oliver Neumann 1a54ed6ad9
Checking ipywidgets is installed for ensure tqdm working (#2417)
* Adding importing ipywidgets before importing tqdm.auto to make sure ipywidgets is installed.

* Updated CHANGELOG.md

* Updated ipywidgets importing checks to @awaelchli comments.

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-30 16:59:35 -04:00
Jirka Borovec f278ac42c8
Revert/Fix: epoch indexing from 1, to be from 0 (#2289)
* Revert "deprecated: epoch indexing from 1 (#2206)"

This reverts commit f94b919b

* chlog

* grad index

* Apply suggestions from code review

* tests

* fix

* test
2020-06-19 23:39:53 -04:00
Paweł Biernat 3256fe4e5a
Update progress.py (#2268)
Fixes a minor bug introduced in #2213
2020-06-19 15:47:39 -04:00
William Falcon 04c794ca72
[WIP] Rename overfit_pct to overfit_batches (and fix) and val_percent_check and test_percent_check (and fix) (#2213)
* fixed percent check for val/test

* fixed percent check for val/test

* fixed percent check for val/test

* fixed percent check for val/test

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-17 08:03:28 -04:00
Jirka Borovec f94b919b96
deprecated: epoch indexing from 1 (#2206)
* epoch indexing from 1

* chlog

* fix tests

* fix tests

* self.min_epochs
2020-06-16 06:33:41 -04:00
Ashraful Islam e0a5aee3a3
fix porgressbar postfix order (#1874) 2020-05-18 20:33:51 -04:00
William Falcon eeb411144f
enable fast_dev_run without a validation loop (#1779)
* fix val dataloader

* Update evaluation_loop.py
2020-05-11 11:30:22 -04:00
William Falcon 88c086bbd2
Update progress.py 2020-05-11 09:48:15 -04:00
William Falcon 15c11fc848
fixes no val loader 2020-05-11 09:47:33 -04:00
Adrian Wälchli d06d5e68b6
Fix typo in progress bar docs (#1680)
* fix typo

* Typo

* typo Borda

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-02 09:08:21 -04:00
Adrian Wälchli 3e8f2d99a9
Progress bar callback (#1450)
* squash and rebase

sanity check hooks


sanity check callback hook finish


moved core progress bar functionality into callback


wip


remove duplicate merge


clean up


imports


docs


sanity check progress bar main


sanity


move callback calls


init progrss bar callback


configuration and docs


changelog


rate decorator


pass process_position


disable on rank > 0


position index


is_enabled


remove decorator


refactor init tqdm bars


callback method ordering 


cannot reset when disabled


sequence -> list


default values


fix has no attr _time() 


move on_val_end to proper place


fix the pickle issue


update warning


properties


check for None


remove old comment


switch order


pull out non-tqdm functionality into base class


documentation for the base class


docs


fix refresh rate issue in validation


restrict type hint of trainer arg


more docs


update trainer docs


rst docs


fix lines too long


fix test


add missing type hints


fix typo


move docstring to __init__ solves doctest failures


remove doctest :(( can't fix the pickle error


fix example


simplify by saving trainer reference


fix docs errors


move docstring


initial value


multiple val checks per epoch


simpler handling of inf dataset sizes


update inf docs


renamed training_tqdm_dict


rename get_tqdm_dict


rename occurences of tqdm 


update changelog


fix doctest


fix formatting errors


added callback tests


progress bar on off test


more tests for progress bar


weird test fix?


add ignored property


disable default progress bar in LR finder


change enable/disable behavior


trying doctest in CI again


undo doctest pickle error


undo doctest pickle error :((


remove progress_bar_callback Trainer arg and fix tests


restore progress bar after auto lr find


update docs


fix rebase


fix wrong negation

* fix fast dev run total

* more thorough testing

* remove old args

* fix merge

* fix merge

* separate tests

* type hint total batches

* reduce if

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* is_disabled

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* is_enabled

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* rename enabled/disabled

* move deprecated api

* remove duplicated test from merge

* fix rename is_disabled

* newline

* test also testprogress for fast dev run

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-04-23 20:46:18 -04:00