* Fix pyright member access errors in training module
* Fix Trainer instantiation error due to inheritence order
* Add GH workflow for pyright
* Fix more pyright errors in trainer module
* Add pyrightconfig and setup python environment in type-check workflow
* Exclude pyrightconfig.json
* suggestions
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* refactor and added hook
variant a
variant b
add test
revert rename
add changelog
docs
* resolve merge duplication
* overridden typo
* fix test
* tpu id
* raise if TPU not available
* re-use apply_to_collection function for parsing collections
* comment
* make utility function available to user
* documentation
* move changelog entry to top
* fix tpu transfer call
* fix call
* remove hardcoded string
* improve test
* call model hook by default
* Apply suggestions from code review
* rename utility function
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* replace ddp spawn with subprocess
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* hot fix
* Join Horovod workers at the end of trainer.fit() to prevent race conditions following training
* flake8
* flake8
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* Fix Horovod backend to disable progress bar on all ranks except 0
* Add join barriers
* Added changelog
* Make protected and add verbosity
* Refactor to disable progress bar callback in train
* Removed vebose setting
* Add cache check for Horovod
* Test run again
* Updated comment
* Always skip cache for Horovod
* Only reinstall when necessary
* Added separate step
* Fixed spacing
* Skip Python 3.8
* params
* drop acc
* Fix Horovod distributed backend to set the root_gpu
* Fixed test
* Fixed tests
* Fixed lint
* Set root_gpu during initialization
* chlog
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* squash and rebase
sanity check hooks
sanity check callback hook finish
moved core progress bar functionality into callback
wip
remove duplicate merge
clean up
imports
docs
sanity check progress bar main
sanity
move callback calls
init progrss bar callback
configuration and docs
changelog
rate decorator
pass process_position
disable on rank > 0
position index
is_enabled
remove decorator
refactor init tqdm bars
callback method ordering
cannot reset when disabled
sequence -> list
default values
fix has no attr _time()
move on_val_end to proper place
fix the pickle issue
update warning
properties
check for None
remove old comment
switch order
pull out non-tqdm functionality into base class
documentation for the base class
docs
fix refresh rate issue in validation
restrict type hint of trainer arg
more docs
update trainer docs
rst docs
fix lines too long
fix test
add missing type hints
fix typo
move docstring to __init__ solves doctest failures
remove doctest :(( can't fix the pickle error
fix example
simplify by saving trainer reference
fix docs errors
move docstring
initial value
multiple val checks per epoch
simpler handling of inf dataset sizes
update inf docs
renamed training_tqdm_dict
rename get_tqdm_dict
rename occurences of tqdm
update changelog
fix doctest
fix formatting errors
added callback tests
progress bar on off test
more tests for progress bar
weird test fix?
add ignored property
disable default progress bar in LR finder
change enable/disable behavior
trying doctest in CI again
undo doctest pickle error
undo doctest pickle error :((
remove progress_bar_callback Trainer arg and fix tests
restore progress bar after auto lr find
update docs
fix rebase
fix wrong negation
* fix fast dev run total
* more thorough testing
* remove old args
* fix merge
* fix merge
* separate tests
* type hint total batches
* reduce if
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* is_disabled
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* is_enabled
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* rename enabled/disabled
* move deprecated api
* remove duplicated test from merge
* fix rename is_disabled
* newline
* test also testprogress for fast dev run
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add automatic GPU choice to trainer
This commit adds the `gpu_choice` parameter to Trainer. By default,
this parameter is set to 'manual' which causes no observable
difference in behavior.
When `gpu_choice` is set to "auto" and `gpus` is an int, then the
trainer will automatically allocate the first available GPU.
This is especially useful when GPUs are configured to be in "exclusive
mode", which means that only one process at a time can use them.
* Rename gpu_choice -> auto_select_gpus
* Set precision=16 when use_amp is passed as True
* Update CHANGELOG.md
* add use_amp to deprecated API
* Update trainer.py
* Update trainer.py
* move the use_amp attribute to deprecated API
* move use_amp deprecation back to Trainer's __init__
* drop unsed
* drop deprecated
* reorder imports
* typing
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
* SA: for #958: set torch cuda device when finding root
* SA: for #958: removing root gpu hack in trainer/evaluation_loop
* SA: setting torch cuda device
* comment line too long
* check if root gpu exists or available
* Incorporating suggestions on #1094
* since root gpu returns none instead of -1 for cpu
* undo changes
* fixed dp memory thing
Co-authored-by: Shubham Agarwal <shubhamagarwal92@gmail.com>
* show progress bar dependent on refresh_rate
* test progress_bar_refresh control show bar
* remove show_progress_bar from other tests
* borda fixes
* flake8 fix
* changelog update prog bar refresh rate
* move show_progress_bar to deprecated 0.9 api
* rm show_progress_bar references, test deprecated
* Update pytorch_lightning/trainer/__init__.py
* fix test
* changelog
* minor CHANGELOG.md format
* Update pytorch_lightning/trainer/__init__.py
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Gerard Bentley <gbkh2015@mymail.pomona.edu>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
* added tpu docs
* added tpu flags
* add tpu docs + init training call
* amp
* amp
* amp
* amp
* optimizer step
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* fix test pkg create (#873)
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Luis Capelo <luiscape@gmail.com>
* Fix segmentation example (#876)
* removed torchvision model and added custom model
* minor fix
* Fixed relative imports issue
* Fix/typo (#880)
* Update greetings.yml
* Update greetings.yml
* Changelog (#869)
* Create CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update PULL_REQUEST_TEMPLATE.md
* Update PULL_REQUEST_TEMPLATE.md
* Add PR links to Version 0.6.0 in CHANGELOG.md
* Add PR links for Unreleased in CHANGELOG.md
* Update PULL_REQUEST_TEMPLATE.md
* Fixing Function Signatures (#871)
* added tpu docs
* added tpu flags
* add tpu docs + init training call
* amp
* amp
* amp
* amp
* optimizer step
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Luis Capelo <luiscape@gmail.com>
Co-authored-by: Akshay Kulkarni <akshayk.vnit@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Shikhar Chauhan <xssChauhan@users.noreply.github.com>
* remove unnecessary pass statements
* use isinstance for type checks
* remove unnecessary else/elif after return
* remove unnecessary return statements
* move doc string to top
* merge isinstance calls
* remove unnecessary else/elif after raise
* use list comprehension
* do not use len without comparison
* add missing shebang
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add missing period to doc string
* remove unnecessary pass statements
* use isinstance for type checks
* remove unnecessary else/elif after return
* remove unnecessary return statements
* move doc string to top
* merge isinstance calls
* remove unnecessary else/elif after raise
* use list comprehension
* do not use len without comparison
* add missing shebang
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add missing period to doc string
* Fix default ckpt path when logger exists (#771)
* rename logging -> loggers (#767)
* move logging >> loggers
* add warning
* fix tests
* logging alias
* formatting
* formatting
* use isinstance for type checks
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add more detail to tbptt example (#755)
* add more detail to tbptt example
* warn user about new arg in training_step
Co-authored-by: Vadim Bereznyuk <kuynzereb@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>