* squash and rebase
sanity check hooks
sanity check callback hook finish
moved core progress bar functionality into callback
wip
remove duplicate merge
clean up
imports
docs
sanity check progress bar main
sanity
move callback calls
init progrss bar callback
configuration and docs
changelog
rate decorator
pass process_position
disable on rank > 0
position index
is_enabled
remove decorator
refactor init tqdm bars
callback method ordering
cannot reset when disabled
sequence -> list
default values
fix has no attr _time()
move on_val_end to proper place
fix the pickle issue
update warning
properties
check for None
remove old comment
switch order
pull out non-tqdm functionality into base class
documentation for the base class
docs
fix refresh rate issue in validation
restrict type hint of trainer arg
more docs
update trainer docs
rst docs
fix lines too long
fix test
add missing type hints
fix typo
move docstring to __init__ solves doctest failures
remove doctest :(( can't fix the pickle error
fix example
simplify by saving trainer reference
fix docs errors
move docstring
initial value
multiple val checks per epoch
simpler handling of inf dataset sizes
update inf docs
renamed training_tqdm_dict
rename get_tqdm_dict
rename occurences of tqdm
update changelog
fix doctest
fix formatting errors
added callback tests
progress bar on off test
more tests for progress bar
weird test fix?
add ignored property
disable default progress bar in LR finder
change enable/disable behavior
trying doctest in CI again
undo doctest pickle error
undo doctest pickle error :((
remove progress_bar_callback Trainer arg and fix tests
restore progress bar after auto lr find
update docs
fix rebase
fix wrong negation
* fix fast dev run total
* more thorough testing
* remove old args
* fix merge
* fix merge
* separate tests
* type hint total batches
* reduce if
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* is_disabled
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* is_enabled
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* rename enabled/disabled
* move deprecated api
* remove duplicated test from merge
* fix rename is_disabled
* newline
* test also testprogress for fast dev run
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* The epoch was being logged to metrics, which isn't read, rather than to current_metrics.
* Updated the tests to account for the epoch arriving at the logger.
* Add tests for distributed backend config
* Refactor set_distributed_mode
* Use gloo backend on cpu
* Use 127.0.0.1 instead of 127.0.0.2
Not totally clear on why this is necessary, but it seemt to work
* Update LightningDDP so that it works with CPU
* Add ddp_cpu backend and num_processes Trainer arg
* PEP8
* Fix test skipping. Inequalities are hard :/
* Skip ddp_cpu test on Windows
* Make a few more cases fall back to ddp_cpu
* New function name
* Flake8
* Don't test distributed on MacOS with torch < 1.3
Support for distributed in MacOS was added in Torch 1.3.0
* Add ddp_cpu and num_processes to docs
* Parametrize trainer config tests
* Tweak warning
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Remove redundant test
* Replace pass branches with comments
* Add missing warnings import
* save_path -> root_dir
* Use new rank_zero_warn
* Whitespace
* Apply suggestions from code review
* formatting
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
* remove error when test dataloader used in test
* remove error when test dataloader used in test
* remove error when test dataloader used in test
* remove error when test dataloader used in test
* remove error when test dataloader used in test
* remove error when test dataloader used in test
* fix lost model reference
* remove error when test dataloader used in test
* fix lost model reference
* moved optimizer types
* moved optimizer types
* moved optimizer types
* moved optimizer types
* moved optimizer types
* moved optimizer types
* moved optimizer types
* moved optimizer types
* added tests for warning
* fix lost model reference
* fix lost model reference
* added tests for warning
* added tests for warning
* refactoring
* refactoring
* fix imports
* refactoring
* fix imports
* refactoring
* fix tests
* fix mnist
* flake8
* review
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
* Add automatic GPU choice to trainer
This commit adds the `gpu_choice` parameter to Trainer. By default,
this parameter is set to 'manual' which causes no observable
difference in behavior.
When `gpu_choice` is set to "auto" and `gpus` is an int, then the
trainer will automatically allocate the first available GPU.
This is especially useful when GPUs are configured to be in "exclusive
mode", which means that only one process at a time can use them.
* Rename gpu_choice -> auto_select_gpus
* Allow reinits in sub procs
* Dont create an experiment on pickle, name, or project
* Comments consistency
* Fix test
* Apply suggestions from code review
Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Make training_epoch_end behave like validation_epoch_end + minor fixes in docstrings.
* Minor fixes (Borda's comments).
* Detach tensors in batch_output (to avoid possible memory leak) + doc fix.
Co-authored-by: Jean-Baptiste SCHIRATTI <jean-baptisteschiratti@MacBook-Pro-de-Jean-Baptiste.local>
* show progress bar dependent on refresh_rate
* test progress_bar_refresh control show bar
* remove show_progress_bar from other tests
* borda fixes
* flake8 fix
* changelog update prog bar refresh rate
* move show_progress_bar to deprecated 0.9 api
* rm show_progress_bar references, test deprecated
* Update pytorch_lightning/trainer/__init__.py
* fix test
* changelog
* minor CHANGELOG.md format
* Update pytorch_lightning/trainer/__init__.py
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Gerard Bentley <gbkh2015@mymail.pomona.edu>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
* fix RunningMean
* changelog
* fix none
* Update supporters.py
just needed to multiply by zero for init
* Revert "Update supporters.py"
This reverts commit 7e0da6c6
* fix NaN
* formatting
Co-authored-by: William Falcon <waf2107@columbia.edu>
* added custom mnist without torchvision dep
* move files so it does not conflict with mnist gitignore
* mock torchvision for tests
* fix line too long
* fix line too long
* fix "module level import not at top of file" warning
* move mock imports to __init__.py
* simplify MNIST a lot and download directly the .pt files
* further simplify and clean up mnist
* revert import overrides
* make as before
* drop PIL requirement
* move mnist.py to datasets subfolder
* use logging instead of print
* choose same name as in torchvision
* remove torchvision and pillow also from yml file
* refactor if train
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* capitalized class attr
* moved mnist to models
* re-added datsets ignore
* better name for file variable
* Update mnist.py
* move dataset classes to datasets.py
* new line
* update
* update
* fix automerge
* move to base folder
* adapt testingmnist to new mnist base class
* remove temporal fix
* fix datatype
* remove old testingmnist
* readable
* fix import
* fix whitespace
* docstring
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/base/datasets.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* changelog
* added types
* Update CHANGELOG.md
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* exist->isfile
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* index -> idx
* temporary fix for trains error
* better changelog message
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* `add_argparse_args` method fixed (argument types added)
* CHANGELOG.md upd
* autopep8 fixes
* --gpus=0 removed from test (for ci tests)
* typo fixed
* reduce on plateau scheduler fixed
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* test_get_init_arguments_and_types added
* autopep8 fixes
* Apply suggestions from code review
* cosmetics
* cosmetics
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* `Trainer.get_init_arguments_and_types` now returns arg types wrapped in tuples (not in sets)
* deprecated args are now ignored in argparser
* get_deprecated_arg_names small refactor
* get_deprecated_arg_names bug fixed
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* Trainer cli related tests moved to test_trainer_cli.py
* test_get_init_arguments_and_types added
* autopep8 fixes
* autopep8 fixes
* Apply suggestions from code review
* cosmetics
* cosmetics
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* `Trainer.get_init_arguments_and_types` now returns arg types wrapped in tuples (not in sets)
* deprecated args are now ignored in argparser
* get_deprecated_arg_names small refactor
* get_deprecated_arg_names bug fixed
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Joe Davison <joe@huggingface.co>
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Joe Davison <joe@huggingface.co>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Joe Davison <joe@huggingface.co>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* pylint
* model API
* update test
* formatting
* disable logger
* fix checking overwrite
* fix test
* typo
* deprecated model
* fix for DDP
* drop Flake8 in GH actions
* Update pytorch_lightning/trainer/evaluation_loop.py
* fix imports
Co-authored-by: Nic Eggert <nic@eggert.io>
* check for nan values
* test nan detection on loss
* sys.exit
* whitespace
* detect nan and inf values in loss and params
* update
* added documentation
* moved detect nan to training loop, remove flag for print
* blank line
* test
* rename
* deprecate print_nan_grads
* deprecated print_nan_grads
* remove unused imports
* update changelog
* fix line too long
* correct deprecated version
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* raise exception instead of sysexit
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* raise exception instead of sysexit
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/training_tricks.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/training_tricks.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* fix test
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* removed project and experiment from getstate
* added tests for closing experiment, updated token in example to user neptuner
* updated teoken
* Update neptune.py
added a link to example experiment
* added exmaple experiment link
* dropped duplication
* flake fixes
* merged with master, added changes information to CHANGELOG
* Added support for non-primitive types to tensorboard logger
* added EOF newline
* PEP8
* Updated CHANGELOG for PR #1130. Moved _sanitize_params to base logger. Cleaned up _sanitize_params
* Updated CHANGELOG for PR #1130. Moved _sanitize_params to base logger. Cleaned up _sanitize_params
* changed convert_params to static method
* PEP8
* Cleanup Doctest for _sanitize_params
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Removed OrderedDict import
* Updated import order to conventions
Co-authored-by: Manbir Gulati <manbirgulati@Manbirs-MBP.hsd1.md.comcast.net>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* enabled early stopping/checkpooiunt even without val step
* name formatting
* version
* testing
* add test
* fix test
* Update model_checkpoint.py
* doctests
* pylint
* tests
* debug
* debug
* enabled early stopping/checkpooiunt even without val step
* fix MNIST download (#1044)
* fix MNIST download
* simple
* name formatting
* version
* testing
* add test
* fix test
* doctests
* tests
* debug
* debug
* rebased 1041
* rebased 1041
* tests
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
* rebased 1041
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* consolidate callbacks and hooks
* ensure callbacks recieve proper arg types
* remove model from init callback events
* clean up early stopping event
* update changelog
* remove on_fit_start and on_fit_end
* fix args for on_init_start and on_init_end
* handle case where early stopping is not used
* show all callback methods
* wrap checkpoint callback logic into proper class
* fix check for main process in checkpoint callback
* move callbacks test to separate file
* refactor arg checks
* get model and call hook on same line
* define trainer_options dict in one call
* add more asserts to callback test
* Add callback system + associated test
* Add trainer and pl_module args to callback methods
* typing
* typo in docstring
* Switch to on_.*_start()
* fix on_test_start
* fix the mess after rebasing
* added get dataloaders directly using a getter
* deleted decorator
* added prepare_data hook
* refactored dataloader init
* refactored dataloader init
* added dataloader reset flag and main loop
* added dataloader reset flag and main loop
* added dataloader reset flag and main loop
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* made changes
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed bad loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixed error in .fit with loaders
* fixes#909
* fixes#909
* bug fix
* Fixes#902
* Properly restore current epoch and global step on resume
* Add test
* Move increment to saving rather than loading
* Fix other tests that refer to current epoch
* Formatting
* Add warning for mid-epoch resuming
* Formatting
* Fix warning check for accumulated batches
* Add variable to init
* Formatting
* Add check for 0 training steps
* Make check more readable
* added file that contains information on the minimal versions needed for the supported loggers
* copied minimal version, combined files, deleted duplicates
* sorted functions in tests/test_loggers.py to be consistent
* expanded wandb logging test; added minimal versions for requirements-extra.txt; increased the amount of training data that is used for tests
* formatting
* added requirements-extra.txt to MANIFEST.in
* reverted wandb test; ensured minimal version for dependencies in requirements-extra.txt in ci-testing.yml
* new way of passing dataloaders
* fixed docs
* fixed codestyle to follow flake8
* allow val/test be list of dataloaders and smarter checking
* added test
* fix flake error
* fix linking to new test model
* split into multiple test
* fix naming and typo
* minor documentation changes
* remove random file
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* better error/warning message
* final adjustments
* update CHANGELOG.md
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Added max number of steps in Trainer
* Added docstring
* Fix flake8 errors
* Clarified docstrings
* Fixed flake8 error
* Added min_steps to Trainer
* Added steps and epochs test
* flake8
* minor fix
* fix steps test in test_trainer
* Split steps test into 2 tests
* Refactor steps test
* Update test_trainer.py
* Minor in test_trainer.py
* Update test_trainer.py
* Address PR comments
* Minor
Co-authored-by: William Falcon <waf2107@columbia.edu>
* added tpu docs
* added tpu flags
* add tpu docs + init training call
* amp
* amp
* amp
* amp
* optimizer step
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* fix test pkg create (#873)
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Luis Capelo <luiscape@gmail.com>
* Fix segmentation example (#876)
* removed torchvision model and added custom model
* minor fix
* Fixed relative imports issue
* Fix/typo (#880)
* Update greetings.yml
* Update greetings.yml
* Changelog (#869)
* Create CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update PULL_REQUEST_TEMPLATE.md
* Update PULL_REQUEST_TEMPLATE.md
* Add PR links to Version 0.6.0 in CHANGELOG.md
* Add PR links for Unreleased in CHANGELOG.md
* Update PULL_REQUEST_TEMPLATE.md
* Fixing Function Signatures (#871)
* added tpu docs
* added tpu flags
* add tpu docs + init training call
* amp
* amp
* amp
* amp
* optimizer step
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Luis Capelo <luiscape@gmail.com>
Co-authored-by: Akshay Kulkarni <akshayk.vnit@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Shikhar Chauhan <xssChauhan@users.noreply.github.com>
* Allow experiment versions to be overridden by passing a string value.
Allow experiment names to be empty, in which case no per-experiment subdirectory will be created and checkpoints will be saved in the directory given by the save_dir parameter.
* Document tensorboard api changes
* Review comment fixes plus fixed test failure for minimum requirements build
* More format fixes from review
* initial implementation
* formatting, pass through profiler, docstring
* call profiler during training
* add initial tests
* report stats when training is done
* fix formatting
* error handling, bugfix in passthroughprofiler
* finish documenting profiler arg in Trainer
* relax required precision for profiling tests
* option to dump cProfiler results to text file
* use logging, format with black
* include profiler in docs
* improved logging and better docs
* appease the linter
* better summaries, wrapper for iterables
* fix typo
* allow profiler=True creation
* more documentation
* add tests for advanced profiler
* Update trainer.py
* make profilers accessible in pl.utilities
* reorg profiler files
* change import for profiler tests
Co-authored-by: William Falcon <waf2107@columbia.edu>
* remove unnecessary pass statements
* use isinstance for type checks
* remove unnecessary else/elif after return
* remove unnecessary return statements
* move doc string to top
* merge isinstance calls
* remove unnecessary else/elif after raise
* use list comprehension
* do not use len without comparison
* add missing shebang
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add missing period to doc string
* remove unnecessary pass statements
* use isinstance for type checks
* remove unnecessary else/elif after return
* remove unnecessary return statements
* move doc string to top
* merge isinstance calls
* remove unnecessary else/elif after raise
* use list comprehension
* do not use len without comparison
* add missing shebang
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add missing period to doc string
* Fix default ckpt path when logger exists (#771)
* rename logging -> loggers (#767)
* move logging >> loggers
* add warning
* fix tests
* logging alias
* formatting
* formatting
* use isinstance for type checks
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add more detail to tbptt example (#755)
* add more detail to tbptt example
* warn user about new arg in training_step
Co-authored-by: Vadim Bereznyuk <kuynzereb@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* Fix distributed_backend=None test
We now throw a warning instead of an exception. Update test
to reflect this.
* Fix test_tube logger close when debug=True
* updated gitignore
* Update README.md
* updated gitignore
* updated links in ninja file
* updated docs
* Update README.md
* Update README.md
* finished callbacks
* finished callbacks
* finished callbacks
* fixed left menu
* added callbacks to menu
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* fixing TensorBoard (#687)
* flake8
* fix typo
* fix tensorboardlogger
drop test_tube dependence
* formatting
* fix tensorboard & tests
* upgrade Tensorboard
* test formatting separately
* try to fix JIT issue
* add tests for 1.4
* added direct links to docs
* updated gitignore
* updated links in ninja file
* updated docs
* finished callbacks
* finished callbacks
* finished callbacks
* fixed left menu
* added callbacks to menu
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* finished rebase
* making private members
* making private members
* making private members
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* set auto dp if no backend
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* fixed lightning import
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* finished lightning module
* finished lightning module
* finished lightning module
* finished lightning module
* added callbacks
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* set auto dp if no backend
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* flake 8
* flake 8
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Basic wandb support
* refactor(wandb): remove unused variables and document logger
* docs(wandb): explain how to use WandbLogger
* test(wandb): add tests for WandbLogger
* feat(wandb): add save_dir
* fix(wandb): allow pickle of logger
* fix(wandb): save logs in custom directory
* test(wandb): test import
* docs(wandb): simplify docstring and use doctest
* test: increase number of epochs for satisfactory accuracy
* test(test_load_model_from_checkpoint): ensure we load last checkpoint
Co-authored-by: Chris Van Pelt <vanpelt@wandb.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* added neptune integration
* added tests for NeptuneLogger, added neptune to docs
* updated link to neptune support
* fixed docstrings, fixed try/except in tests, changed append_tags input
* fixed docstrings line lenght
* bumped epoch nr in model restore tests
* added tags support for single strings
* fixed passing neptune token to backend
* fixed project name in offline mode
* added save_top_k=-1 to checkpoint callback
* reformated initialization of neptune in online mode
* bumped epoch nr to 4 in test_load_model_from_checkpoint
* bumped epoch nr to 5
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Run AMP tests in their own process
With opt_level="O1" (the default), AMP patches many
torch functions, which breaks any tests that run afterwards.
This patch introduces a pytest extension that lets
tests be marked with @pytest.mark.spawn so that they
are run in their own process using torch.multiprocessing.spawn
so that the main python interpreter stays un-patched.
Note that tests using DDP already run AMP in its own process,
so they don't need this annotation.
* Fix AMP tests
Since AMP defaults to O1 now, DP tests no longer throw exceptions.
Since AMP patches torch functions, CPU inference no longer works.
Skip prediction step for AMP tests.
* typo
* Renamed `on_sanity_check_start` to `on_train_start` and added `on_train_end` to `ModelHooks`
* changed tests to use `on_train_start` instead of `on_sanity_check_start`
* Fixing comet ml bug and adding functionality
* Updating documents
* Fixing code style issues in comet_logger
* Changing comet_logger experiment to execute lazily
* Adding tests for comet_logger and addressing comments from @Borda
* Setting step_num to optional keyword argument in log_metrics() to comply to other loggers
* Adding offline logging mode for comet_ml, updating tests and docs
* Switching to MisconfigurationException
* Make name and version properties required
* Warn before deleting files in checkpoint directory
* Get default checkpoint path from any logger
* Fix typos
* Uncomment logger tests
* Whitespace
* Update callback_config_mixin.py
checkpoints and version file names would just have a number. it's easy to tell what you're looking at with version_ prepended
* Address comments
* Fix broken tests
* #452 Fix ValueError
* #452 Use subprocess.run
* #452 Simplify code for gpu_memory_map
* #452 Simplify code for min max memory
* #452 Add test for get_memory_profile
* #452 Use os.sep
* #452 Use os.linesep