* added file that contains information on the minimal versions needed for the supported loggers
* copied minimal version, combined files, deleted duplicates
* sorted functions in tests/test_loggers.py to be consistent
* expanded wandb logging test; added minimal versions for requirements-extra.txt; increased the amount of training data that is used for tests
* formatting
* added requirements-extra.txt to MANIFEST.in
* reverted wandb test; ensured minimal version for dependencies in requirements-extra.txt in ci-testing.yml
* new way of passing dataloaders
* fixed docs
* fixed codestyle to follow flake8
* allow val/test be list of dataloaders and smarter checking
* added test
* fix flake error
* fix linking to new test model
* split into multiple test
* fix naming and typo
* minor documentation changes
* remove random file
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* better error/warning message
* final adjustments
* update CHANGELOG.md
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Added max number of steps in Trainer
* Added docstring
* Fix flake8 errors
* Clarified docstrings
* Fixed flake8 error
* Added min_steps to Trainer
* Added steps and epochs test
* flake8
* minor fix
* fix steps test in test_trainer
* Split steps test into 2 tests
* Refactor steps test
* Update test_trainer.py
* Minor in test_trainer.py
* Update test_trainer.py
* Address PR comments
* Minor
Co-authored-by: William Falcon <waf2107@columbia.edu>
* added tpu docs
* added tpu flags
* add tpu docs + init training call
* amp
* amp
* amp
* amp
* optimizer step
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* fix test pkg create (#873)
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Luis Capelo <luiscape@gmail.com>
* Fix segmentation example (#876)
* removed torchvision model and added custom model
* minor fix
* Fixed relative imports issue
* Fix/typo (#880)
* Update greetings.yml
* Update greetings.yml
* Changelog (#869)
* Create CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update PULL_REQUEST_TEMPLATE.md
* Update PULL_REQUEST_TEMPLATE.md
* Add PR links to Version 0.6.0 in CHANGELOG.md
* Add PR links for Unreleased in CHANGELOG.md
* Update PULL_REQUEST_TEMPLATE.md
* Fixing Function Signatures (#871)
* added tpu docs
* added tpu flags
* add tpu docs + init training call
* amp
* amp
* amp
* amp
* optimizer step
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added auto data transfer to TPU
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
* added test return and print
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Luis Capelo <luiscape@gmail.com>
Co-authored-by: Akshay Kulkarni <akshayk.vnit@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Shikhar Chauhan <xssChauhan@users.noreply.github.com>
* Allow experiment versions to be overridden by passing a string value.
Allow experiment names to be empty, in which case no per-experiment subdirectory will be created and checkpoints will be saved in the directory given by the save_dir parameter.
* Document tensorboard api changes
* Review comment fixes plus fixed test failure for minimum requirements build
* More format fixes from review
* initial implementation
* formatting, pass through profiler, docstring
* call profiler during training
* add initial tests
* report stats when training is done
* fix formatting
* error handling, bugfix in passthroughprofiler
* finish documenting profiler arg in Trainer
* relax required precision for profiling tests
* option to dump cProfiler results to text file
* use logging, format with black
* include profiler in docs
* improved logging and better docs
* appease the linter
* better summaries, wrapper for iterables
* fix typo
* allow profiler=True creation
* more documentation
* add tests for advanced profiler
* Update trainer.py
* make profilers accessible in pl.utilities
* reorg profiler files
* change import for profiler tests
Co-authored-by: William Falcon <waf2107@columbia.edu>
* remove unnecessary pass statements
* use isinstance for type checks
* remove unnecessary else/elif after return
* remove unnecessary return statements
* move doc string to top
* merge isinstance calls
* remove unnecessary else/elif after raise
* use list comprehension
* do not use len without comparison
* add missing shebang
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add missing period to doc string
* remove unnecessary pass statements
* use isinstance for type checks
* remove unnecessary else/elif after return
* remove unnecessary return statements
* move doc string to top
* merge isinstance calls
* remove unnecessary else/elif after raise
* use list comprehension
* do not use len without comparison
* add missing shebang
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add missing period to doc string
* Fix default ckpt path when logger exists (#771)
* rename logging -> loggers (#767)
* move logging >> loggers
* add warning
* fix tests
* logging alias
* formatting
* formatting
* use isinstance for type checks
* revert isinstance check back to type
broke tests, because bool is actually subclass of int
* add more detail to tbptt example (#755)
* add more detail to tbptt example
* warn user about new arg in training_step
Co-authored-by: Vadim Bereznyuk <kuynzereb@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* Fix distributed_backend=None test
We now throw a warning instead of an exception. Update test
to reflect this.
* Fix test_tube logger close when debug=True
* updated gitignore
* Update README.md
* updated gitignore
* updated links in ninja file
* updated docs
* Update README.md
* Update README.md
* finished callbacks
* finished callbacks
* finished callbacks
* fixed left menu
* added callbacks to menu
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* fixing TensorBoard (#687)
* flake8
* fix typo
* fix tensorboardlogger
drop test_tube dependence
* formatting
* fix tensorboard & tests
* upgrade Tensorboard
* test formatting separately
* try to fix JIT issue
* add tests for 1.4
* added direct links to docs
* updated gitignore
* updated links in ninja file
* updated docs
* finished callbacks
* finished callbacks
* finished callbacks
* fixed left menu
* added callbacks to menu
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* added direct links to docs
* finished rebase
* making private members
* making private members
* making private members
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* set auto dp if no backend
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* working on trainer docs
* fixed lightning import
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* cleared spaces
* finished lightning module
* finished lightning module
* finished lightning module
* finished lightning module
* added callbacks
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* set auto dp if no backend
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* added loggers
* flake 8
* flake 8
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Basic wandb support
* refactor(wandb): remove unused variables and document logger
* docs(wandb): explain how to use WandbLogger
* test(wandb): add tests for WandbLogger
* feat(wandb): add save_dir
* fix(wandb): allow pickle of logger
* fix(wandb): save logs in custom directory
* test(wandb): test import
* docs(wandb): simplify docstring and use doctest
* test: increase number of epochs for satisfactory accuracy
* test(test_load_model_from_checkpoint): ensure we load last checkpoint
Co-authored-by: Chris Van Pelt <vanpelt@wandb.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* added neptune integration
* added tests for NeptuneLogger, added neptune to docs
* updated link to neptune support
* fixed docstrings, fixed try/except in tests, changed append_tags input
* fixed docstrings line lenght
* bumped epoch nr in model restore tests
* added tags support for single strings
* fixed passing neptune token to backend
* fixed project name in offline mode
* added save_top_k=-1 to checkpoint callback
* reformated initialization of neptune in online mode
* bumped epoch nr to 4 in test_load_model_from_checkpoint
* bumped epoch nr to 5
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Run AMP tests in their own process
With opt_level="O1" (the default), AMP patches many
torch functions, which breaks any tests that run afterwards.
This patch introduces a pytest extension that lets
tests be marked with @pytest.mark.spawn so that they
are run in their own process using torch.multiprocessing.spawn
so that the main python interpreter stays un-patched.
Note that tests using DDP already run AMP in its own process,
so they don't need this annotation.
* Fix AMP tests
Since AMP defaults to O1 now, DP tests no longer throw exceptions.
Since AMP patches torch functions, CPU inference no longer works.
Skip prediction step for AMP tests.
* typo
* Renamed `on_sanity_check_start` to `on_train_start` and added `on_train_end` to `ModelHooks`
* changed tests to use `on_train_start` instead of `on_sanity_check_start`
* Fixing comet ml bug and adding functionality
* Updating documents
* Fixing code style issues in comet_logger
* Changing comet_logger experiment to execute lazily
* Adding tests for comet_logger and addressing comments from @Borda
* Setting step_num to optional keyword argument in log_metrics() to comply to other loggers
* Adding offline logging mode for comet_ml, updating tests and docs
* Switching to MisconfigurationException
* Make name and version properties required
* Warn before deleting files in checkpoint directory
* Get default checkpoint path from any logger
* Fix typos
* Uncomment logger tests
* Whitespace
* Update callback_config_mixin.py
checkpoints and version file names would just have a number. it's easy to tell what you're looking at with version_ prepended
* Address comments
* Fix broken tests
* #452 Fix ValueError
* #452 Use subprocess.run
* #452 Simplify code for gpu_memory_map
* #452 Simplify code for min max memory
* #452 Add test for get_memory_profile
* #452 Use os.sep
* #452 Use os.linesep
* hpc restore takes priority over non hpc weights
* hpc restore takes priority over non hpc weights
* hpc restore takes priority over non hpc weights
* hpc restore takes priority over non hpc weights
* hpc restore takes priority over non hpc weights
* hpc restore takes priority over non hpc weights
* hpc restore takes priority over non hpc weights
* Unit tests for num_gpu property as proxy for __parse_gpu_ids.
* Refactoring __parse_gpu_ids
* Moved the function outside the class as it is
an utility function and did not depend on class in any way.
* Added unit tests for it.
* Mocked torch.cuda.device_count function in tests.
This allows the tests to be run on machines that do not have gpus.
* Fixed the parse_gpu_ids function to handle -1 case.
Function now handles -1 the same way as it does for '-1'.
* Unit tests for root_gpu added.
Added backend as a parameter as currently depending on backend set
or not, code fails with exception in certain circumstances, before
giving a wrong answer.
* Moved __set_root_gpu function out of the class.
This function does not depend on the class and can be tested
more easily this way.
Also added unit tests for this function. They simply reuse
data for the root_gpu property.
* determine_root_gpu_device passes unit tests.
* num_gpus passes unit tests.
Also added a None test for this function.
* parse_gpu_ids tests changed to reflect desired state after refactoring.
Planning to refactor parse_gpu_ids to always return list of ints.
This will simplify code that use output of this function.
* * parse_gpu_ids always returns lists
* parse_gpu_ids checks given ids against available ids
* parse_gpu_ids raises exception for non existant ids
* parse_gpu_ids returns None when no gpus are available
* cleaned up determine_root_gpu_device
* cleaned up num_gpus property
* Updated unit tests to reflect changes in the functions
* Flake8 fixes
* Moved fixture code up before where it is used.
* Updated documentation.
* Changed tests to match the API:
* gpus=-1 or gpus='-1' should use all available gpu devices
* gpus=N
* N=0: no gpus should be used.
* N>0: N gpus should be used
* gpus=list of ints or a comma separated string of numbers:
Use the gpus indicated by the list or the string.
* Fixed code to pass all the changed tests for parsing gpus param.
* Refactoring parse_gpu_ids function.
* flake8 fixes.
* Updating documentation.
* flake8 fixes.
* flake8 fixes.
* flake8 fixes
* Update trainer.py
* Update dp_mixin.py
* Make reduce_distributed_output a stand alone function.
Fix imports.
Fix flake8.
* Add comet_ml dependency to tests requirements.txt
* Revert "Make reduce_distributed_output a stand alone function. Fix imports. Fix flake8."
This reverts commit eac0338
* Merge with master.
* moved dp, ddp outside of trainer
* added main mixins
* finished major mixin refactor
* flake8
* finished major mixin refactor
* finished major mixin refactor
* finished major mixin refactor
* finished major mixin refactor
* finished major mixin refactor
* finished major mixin refactor
* finished major mixin refactor
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* changes to seed for tests
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* no warnings always
* no warnings always
* no warnings always
* no warnings always
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up demos
* cleaning up docs
* cleaned up test_tube logger
* cleaned up test_tube logger
* cleaned up test_tube logger
* added lbfgs support
* added lbfgs support
* added lbfgs support
* Fixes#280 (#309)
* added test seeds (#306)
* added test seeds
* added test seeds
* updated docs
* added lbfgs support
* added lbfgs support
* added lbfgs support
* added lbfgs support
* added lbfgs support
* added lbfgs support
* added lbfgs support
* added lbfgs support
* early stopping callback is not default
* added a default logger
* added default checkpoint callback
* added default checkpoint/loggers
* added default checkpoint/loggers
* updated docs
* cleaned demos
* cleaned demos
* cleaned demos
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* clean up docs around loggers
* Create underlying loggers lazily
This avoids creating duplicate experiments or run in multi-node DDP.
* Save hyperparameters automatically
* Update docs for snapshotting hyperparams
* Fix test tube
* Fix test tube pickling
* always calls the lr scheduler with epoch nb
* added docs for cluster grid search
* added docs for cluster grid search
* undo test changes
* undo test changes
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added print logs
* added print logs
* changed close order
* changed close order
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added simple cluster template
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* flake 8
* added tests
* added single gpu data transfer recursive
* added single gpu data transfer recursive
* added single gpu data transfer recursive
* added single gpu data transfer recursive
* added single gpu data transfer recursive
* added single gpu data transfer recursive
* made validation step optional
* added no val model
* val_step can be implemented but not validation_end
* added no val end model
* added tests
* added tests
* remove class
* remove class
* remove class
* remove class
* remove class
* remove class
* remove class
* remove class
* remove class
* remove class
* remove class
* updated docs
* updated docs
* updated test
* updated test
* updated test
* updated test
* updated test
* updated test
* updated test
* updated test
* updated test
* fix pep8