Commit Graph

505 Commits

Author SHA1 Message Date
Aljoscha Steffens 9eb1907151
separate requirements for logger dependencies (#792)
* added file that contains information on the minimal versions needed for the supported loggers

* copied minimal version, combined files, deleted duplicates

* sorted functions in tests/test_loggers.py to be consistent

* expanded wandb logging test; added minimal versions for requirements-extra.txt; increased the amount of training data that is used for tests

* formatting

* added requirements-extra.txt to MANIFEST.in

* reverted wandb test; ensured minimal version for dependencies in requirements-extra.txt in ci-testing.yml
2020-02-21 13:30:27 -05:00
Jeremy Jordan ea8878bc14
clean up tests/test_profiler.py (#867)
* cleanup docstrings, _get_total_cprofile_duration in module

* relax profiler overhead tolerance
2020-02-19 07:09:28 -05:00
Nicki Skafte ffd6e693de
new way of passing dataloaders (#759)
* new way of passing dataloaders

* fixed docs

* fixed codestyle to follow flake8

* allow val/test be list of dataloaders and smarter checking

* added test

* fix flake error

* fix linking to new test model

* split into multiple test

* fix naming and typo

* minor documentation changes

* remove random file

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* better error/warning message

* final adjustments

* update CHANGELOG.md

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-02-19 06:00:08 -05:00
Peter Izsak 054a35312d
Added max number of steps in Trainer (#728)
* Added max number of steps in Trainer

* Added docstring

* Fix flake8 errors

* Clarified docstrings

* Fixed flake8 error

* Added min_steps to Trainer

* Added steps and epochs test

* flake8

* minor fix

* fix steps test in test_trainer

* Split steps test into 2 tests

* Refactor steps test

* Update test_trainer.py

* Minor in test_trainer.py

* Update test_trainer.py

* Address PR comments

* Minor

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-02-18 11:23:22 -05:00
William Falcon d4a31f02e0
Enable TPU support (#868)
* added tpu docs

* added tpu flags

* add tpu docs + init training call

* amp

* amp

* amp

* amp

* optimizer step

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* fix test pkg create (#873)

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added test return and print

* added test return and print

* added test return and print

* added test return and print

* added test return and print

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Luis Capelo <luiscape@gmail.com>

* Fix segmentation example (#876)

* removed torchvision model and added custom model

* minor fix

* Fixed relative imports issue

* Fix/typo (#880)

* Update greetings.yml

* Update greetings.yml

* Changelog (#869)

* Create CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

* Add PR links to Version 0.6.0 in CHANGELOG.md

* Add PR links for Unreleased in CHANGELOG.md

* Update PULL_REQUEST_TEMPLATE.md

* Fixing Function Signatures (#871)

* added tpu docs

* added tpu flags

* add tpu docs + init training call

* amp

* amp

* amp

* amp

* optimizer step

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added auto data transfer to TPU

* added test return and print

* added test return and print

* added test return and print

* added test return and print

* added test return and print

* added test return and print

* added test return and print

* added test return and print

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Luis Capelo <luiscape@gmail.com>
Co-authored-by: Akshay Kulkarni <akshayk.vnit@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Shikhar Chauhan <xssChauhan@users.noreply.github.com>
2020-02-17 16:01:20 -05:00
Vadim Bereznyuk edd4a87fb0
Refactor callbacks (#776)
* Refactor callbacks

* flake8

* Update docstrings

* Simplified callback, protected trainer

* .set_trainer() check

* update docs

* missed super().__ini__()

* Updated tests

* Use uppercase

* refine checkpoint callback tests

* Added test_begin() and test_end()
2020-02-16 00:03:05 -05:00
Jeremy Jordan 4ae31cd1d5
advanced profiler describe + cleaned up tests (#837)
* add py36 compatibility

* add test case to capture previous bug

* clean up tests

* clean up tests
2020-02-15 23:43:43 -05:00
Dmitry Lipin 06ca6428b6
Allow user to specify 'step' key while logging metrics (#808)
* allow to specify 'step' key

* add test

* docs to log_metrics

* fix test

* rename

* also rename
2020-02-15 23:35:23 -05:00
Jirka Borovec 9f939447f2
add autopep8 to Contributions guide (#852)
* add autopep8 to Contrib.

* simplify cmd

* update GH templates

* add pytest-flake8

* update GH template
2020-02-15 20:24:38 -05:00
Jirka Borovec af44583050
drop torchvision, tests only (#797)
* drop torchvision, tests only

* manifest

* move test utils
2020-02-10 22:47:18 -05:00
Bob Kemp 8fa802e35b
Tensorboard path generalisation (#804)
* Allow experiment versions to be overridden by passing a string value.
Allow experiment names to be empty, in which case no per-experiment subdirectory will be created and checkpoints will be saved in the directory given by the save_dir parameter.

* Document tensorboard api changes

* Review comment fixes plus fixed test failure for minimum requirements build

* More format fixes from review
2020-02-10 09:07:17 -05:00
Jirka Borovec fc0ad03008 fix test for profiler (#800)
* fix test for profiler

* use allclose

* user relative tol
2020-02-09 17:48:37 -05:00
Jeremy Jordan 1cf430f7bc
new feature for profiling training runs (#782)
* initial implementation

* formatting, pass through profiler, docstring

* call profiler during training

* add initial tests

* report stats when training is done

* fix formatting

* error handling, bugfix in passthroughprofiler

* finish documenting profiler arg in Trainer

* relax required precision for profiling tests

* option to dump cProfiler results to text file

* use logging, format with black

* include profiler in docs

* improved logging and better docs

* appease the linter

* better summaries, wrapper for iterables

* fix typo

* allow profiler=True creation

* more documentation

* add tests for advanced profiler

* Update trainer.py

* make profilers accessible in pl.utilities

* reorg profiler files

* change import for profiler tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-02-06 22:01:21 -05:00
Adrian Wälchli 472f394788
Resolve some codefactor issues (#756)
* remove unnecessary pass statements

* use isinstance for type checks

* remove unnecessary else/elif after return

* remove unnecessary return statements

* move doc string to top

* merge isinstance calls

* remove unnecessary else/elif after raise

* use list comprehension

* do not use len without comparison

* add missing shebang

* revert isinstance check back to type

broke tests, because bool is actually subclass of int

* add missing period to doc string

* remove unnecessary pass statements

* use isinstance for type checks

* remove unnecessary else/elif after return

* remove unnecessary return statements

* move doc string to top

* merge isinstance calls

* remove unnecessary else/elif after raise

* use list comprehension

* do not use len without comparison

* add missing shebang

* revert isinstance check back to type

broke tests, because bool is actually subclass of int

* add missing period to doc string

* Fix default ckpt path when logger exists (#771)

* rename logging -> loggers (#767)

* move logging >> loggers

* add warning

* fix tests

* logging alias

* formatting

* formatting

* use isinstance for type checks

* revert isinstance check back to type

broke tests, because bool is actually subclass of int

* add more detail to tbptt example (#755)

* add more detail to tbptt example

* warn user about new arg in training_step

Co-authored-by: Vadim Bereznyuk <kuynzereb@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
2020-02-01 18:44:05 -05:00
Jirka Borovec 76a1c67d87
rename logging -> loggers (#767)
* move logging >> loggers

* add warning

* fix tests

* logging alias

* formatting

* formatting
2020-02-01 15:47:58 -05:00
Vadim Bereznyuk 50881c0b31 Check early stopping metric in the beginning of the training (#542)
* Early stopping fix

* Update trainer.py

* Don't force validation sanity check

* fix tests

* update

* Added early_stopping check_metrics

* Updated docs

* Update docs

* Do not call early stopping when validation is disabled

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-01-23 11:12:51 -05:00
Nic Eggert dfb6d3626e Fix failing GPU tests (#722)
* Fix distributed_backend=None test

We now throw a warning instead of an exception. Update test
to reflect this.

* Fix test_tube logger close when debug=True
2020-01-21 14:26:43 -05:00
William Falcon 9e654c4ec8
Update requirements.txt 2020-01-21 08:11:22 -05:00
Jirka Borovec ea59a99426 update org paths & convert logos (#685)
* fix typos

* update org paths

* update links from READMe to docs

* add svg logo

* add svg logo-text

* update logos

* testing temp paths

* prune links from readme

* optimize imports

* update logo

* update paths in README

* missing imports
2020-01-20 14:50:31 -05:00
Z ZH de2ccc03a8 add version_ prefix to log_dir (#706)
* add version_ prefix to log_dir

* add version_ prefix
2020-01-18 07:17:53 -05:00
William Falcon bc67689068
clean v2 docs (#691)
* updated gitignore

* Update README.md

* updated gitignore

* updated links in ninja file

* updated docs

* Update README.md

* Update README.md

* finished callbacks

* finished callbacks

* finished callbacks

* fixed left menu

* added callbacks to menu

* added direct links to docs

* added direct links to docs

* added direct links to docs

* added direct links to docs

* added direct links to docs

* fixing TensorBoard (#687)

* flake8

* fix typo

* fix tensorboardlogger
drop test_tube dependence

* formatting

* fix tensorboard & tests

* upgrade Tensorboard

* test formatting separately

* try to fix JIT issue

* add tests for 1.4

* added direct links to docs

* updated gitignore

* updated links in ninja file

* updated docs

* finished callbacks

* finished callbacks

* finished callbacks

* fixed left menu

* added callbacks to menu

* added direct links to docs

* added direct links to docs

* added direct links to docs

* added direct links to docs

* added direct links to docs

* added direct links to docs

* finished rebase

* making private  members

* making private  members

* making private  members

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* set auto dp if no backend

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* working on trainer docs

* fixed lightning import

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* cleared  spaces

* finished lightning module

* finished lightning module

* finished lightning module

* finished lightning module

* added callbacks

* added loggers

* added loggers

* added loggers

* added loggers

* added loggers

* added loggers

* added loggers

* added loggers

* set auto dp if no backend

* added loggers

* added loggers

* added loggers

* added loggers

* added loggers

* added loggers

* flake 8

* flake 8

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-01-17 06:03:31 -05:00
Jirka Borovec bde549cb36 unify model test acc (#696) 2020-01-17 05:50:26 -05:00
Jirka Borovec f72e354ee6 fixing TensorBoard (#687)
* flake8

* fix typo

* fix tensorboardlogger
drop test_tube dependence

* formatting

* fix tensorboard & tests

* upgrade Tensorboard

* test formatting separately

* try to fix JIT issue

* add tests for 1.4
2020-01-16 07:22:29 -05:00
Boris Dayma ec7fc97857 Feature: wandb logger (#627)
* Basic wandb support

* refactor(wandb): remove unused variables and document logger

* docs(wandb): explain how to use WandbLogger

* test(wandb): add tests for WandbLogger

* feat(wandb): add save_dir

* fix(wandb): allow pickle of logger

* fix(wandb): save logs in custom directory

* test(wandb): test import

* docs(wandb): simplify docstring and use doctest

* test: increase number of epochs for satisfactory accuracy

* test(test_load_model_from_checkpoint): ensure we load last checkpoint

Co-authored-by: Chris Van Pelt <vanpelt@wandb.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-01-13 22:25:27 -05:00
Jirka Borovec f7db44e750 fix deprecated tng and abstract ligntning (#644) 2020-01-13 22:20:38 -05:00
Jakub 8dc8a8bfd3 Neptune integration (#648)
* added neptune integration

* added tests for NeptuneLogger, added neptune to docs

* updated link to neptune support

* fixed docstrings, fixed try/except in tests, changed append_tags input

* fixed docstrings line lenght

* bumped epoch nr in model restore tests

* added tags support for single strings

* fixed passing neptune token to backend

* fixed project name in offline mode

* added save_top_k=-1 to checkpoint callback

* reformated initialization of neptune in online mode

* bumped epoch nr to 4 in test_load_model_from_checkpoint

* bumped epoch nr to 5

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-01-13 22:20:01 -05:00
Jirka Borovec db6b404748 CI pass (#671)
* fix pillow in test

* test acc

* update version in deprecated msg
2020-01-13 22:09:47 -05:00
Vadim Bereznyuk 12edc3099c Fix the number of training batches used in the training loop (#653)
* Fix the number of processed training batches

* Fix tests

* fix tests

* fix tests

* One more attempt

* Fix another test
2020-01-05 14:37:09 -05:00
Nic Eggert 019f612204 Fix amp tests (#661)
* Run AMP tests in their own process

With opt_level="O1" (the default), AMP patches many
torch functions, which breaks any tests that run afterwards.
This patch introduces a pytest extension that lets
tests be marked with @pytest.mark.spawn so that they
are run in their own process using torch.multiprocessing.spawn
so that the main python interpreter stays un-patched.

Note that tests using DDP already run AMP in its own process,
so they don't need this annotation.

* Fix AMP tests

Since AMP defaults to O1 now, DP tests no longer throw exceptions.

Since AMP patches torch functions, CPU inference no longer works.
Skip prediction step for AMP tests.

* typo
2020-01-05 14:34:25 -05:00
Jirka Borovec 5d00e62047 Fix logger, tensorboard (#610)
* fix logger tests

* fix missing flush

* fix tensorboard

* fix namespace

* fix flush

* fix add_hparams
2019-12-08 07:59:25 -08:00
Nic Eggert 5329c72cb0 Implement TensorboardLogger (#607)
* Implement TensorboardLogger

* Pass default_save_path to trainers

* Update tensorboard.py
2019-12-07 23:25:37 -05:00
Jirka Borovec 4970624f8b fix Logger tests for Win (#605)
* fix mlflow test

* fix mlflow test

* update logger / mlflow

* flake8

* fix appveyor
2019-12-07 19:25:12 -05:00
schwobr 2f01c03b38 Additional hooks (#598)
* Renamed `on_sanity_check_start` to `on_train_start` and added `on_train_end` to `ModelHooks`

* changed tests to use `on_train_start` instead of `on_sanity_check_start`
2019-12-07 08:52:06 -05:00
Elliot Waite 1051c189e1 Simplify variables: step, epoch, max_epochs, min_epochs (#589) 2019-12-07 08:50:21 -05:00
Adrian Wälchli f7e1040236 Docs and Tests for "gpus" Trainer Argument (#593)
* add table for gpus argument

* fix typo in error message

* tests for supported values

* tests for unsupported values

* fix typo

* add table for gpus argument

* fix typo in error message

* tests for supported values

* tests for unsupported values

* fix typo

* fix typo list->str

* fix travis warning "line too long"
2019-12-07 08:48:45 -05:00
Nic Eggert 0489e31b02 Fix CometML tests (#585)
* monkeypatch atexit.register to fix problem with cometml logging

* Use experiment id for version in cometml
2019-12-07 00:24:59 -05:00
Jirka Borovec 1d4b6be17b rename trainer modules, drop `_mixin` (#571)
* rename trainer modules, drop _mixin

* fix imports
2019-12-04 11:39:14 -05:00
Jirka Borovec 3a58937d8b rename variables nb -> num (#567)
* rename nb -> num

* flake8

* batch_nb, epoch_nb, gpu_nb, split_nb

* add _num deprecations
2019-12-04 06:57:10 -05:00
Jirka Borovec 63717e8fda prune tests (#564)
* format docstring in tests

* prune unused vars

* optimize imports

* drop duplicated var
2019-12-04 06:48:53 -05:00
Nic Eggert 62f6f92fdf Use pytest tmpdir fixture (#482)
* Use pytest tmpdir

* Switch to tmpdir fixtures

* Switch to tmpdir fixture

* tmpdir fixture

* Fix more conflicts
2019-12-03 08:01:04 -05:00
Jirka Borovec 47659daa5f speed-up testing (#504)
* extend CI timeout

* add short MNIST

* lower dataset and stop thr

* refactor imports

* formatting

* early stop

* play params

* play params

* minor refactoring

# Conflicts:
#	pytorch_lightning/testing/__init__.py
#	pytorch_lightning/testing/lm_test_module.py
#	pytorch_lightning/testing/lm_test_module_base.py
#	pytorch_lightning/testing/lm_test_module_mixins.py
#	pytorch_lightning/testing/model.py
#	pytorch_lightning/testing/model_base.py
#	pytorch_lightning/testing/model_mixins.py
#	pytorch_lightning/testing/test_module.py
#	pytorch_lightning/testing/test_module_base.py
#	pytorch_lightning/testing/test_module_mixins.py

* typo

Co-Authored-By: Ir1dXD <sirius.caffrey@gmail.com>

* Revert "refactor imports"

This reverts commit b86aee92

* update imports
2019-11-28 12:06:05 -05:00
Jirka Borovec 9785a3e78e Refactor: name modules (#548)
* refactor: rename some modules

* add deprecation warnings

* fix paths
2019-11-26 22:39:18 -05:00
Ir1dXD 7324dd902b change Checkpoint callback's `save_best_only` to `save_top_k` (#128)
* docs: enable syntax highlight

* feat: change Checkpoint callback's `save_best_only` to `save_top_k`

fix #70

* docs: update docs for save_top_k

* revert other files

* style: lint for travis-ci

* fix typo

* make flake8 happy

* update according to review

* add tests

* rename func to private

* add doc on `save_top_k == 0`

* make flake8 happy

* update according to PR comments

* change some f-strings

* Update pt_callbacks.py

* Update test_models.py

* update options

* create folders

* Update test_models.py

* change epoch num

* support calling multiple times, add docs and tests

* update docs

* roll back changes in earlystopping

* clean test files

* make flake8 happy

* fix epoch number

* update tests about epoch numbers

* clean debugging code

* fix testing utils codes

* fix testing utils codes

* fix testing utils codes

* fix testing utils codes

* change save_dir to tests/tests according to previous lines

* remove unused overwrite option

* make flake8 happy

* change var name as per review

* make flake8 happy

* update property name to work on master

* elaborate in the docs

* update docs as per review

* revert previous commit

accidentally pressed wrong button when solving conflicts
2019-11-19 15:43:34 -08:00
rwesterman d1b6b011c3 Comet fix (#481)
* Fixing comet ml bug and adding functionality

* Updating documents

* Fixing code style issues in comet_logger

* Changing comet_logger experiment to execute lazily

* Adding tests for comet_logger and addressing comments from @Borda

* Setting step_num to optional keyword argument in log_metrics() to comply to other loggers

* Adding offline logging mode for comet_ml, updating tests and docs

* Switching to MisconfigurationException
2019-11-11 23:00:31 -05:00
Jirka Borovec 1fd1e42aa6 Fix setup-doc for pypi (#472)
* add Twine to CI

* freeze Twine

* freeze Twine

* minor refactoring

* try another

* fix req.

* update README

* fix __doc__

* fix multiple req. test-tube
2019-11-09 00:59:14 -05:00
Nic Eggert 9fa2806605 Fix ModelCheckpoint default paths (#413)
* Make name and version properties required

* Warn before deleting files in checkpoint directory

* Get default checkpoint path from any logger

* Fix typos

* Uncomment logger tests

* Whitespace

* Update callback_config_mixin.py

checkpoints and version file names would just have a number. it's easy to tell what you're looking at with version_ prepended

* Address comments

* Fix broken tests
2019-11-05 10:41:59 -05:00
Yongrae Jo 32dd803b1e Fix min_max gpu memory logging bug (#453)
* #452 Fix ValueError

* #452 Use subprocess.run

* #452 Simplify code for gpu_memory_map

* #452 Simplify code for min max memory

* #452 Add test for get_memory_profile

* #452 Use os.sep

* #452 Use os.linesep
2019-11-05 08:55:44 -05:00
Ir1dXD 5a9afb11cc change print to logging (#457)
* change print to logging

* always use logging.info

* use f-strings

* update code style

* set logging configs

* remove unused code
2019-11-05 08:43:21 -05:00
William Falcon 37729f0a17
fixing test (#451) 2019-11-03 08:52:22 -05:00
Tullie Murrell 248495b1d1 Add tbptt (#429)
* Add truncated bptt

* Fix rebase error

* AutoPep8

* Address comments, incl default bptt_split impl

* Add tbptt test

* Add default split for lists/tuples

* Add tbptt docs

* Fix trainer spacing

* Update RequiredTrainerInterface.md
2019-10-31 06:45:28 -04:00
William Falcon 5db90e32eb
hpc restore takes priority over non hpc weights (#419)
* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights
2019-10-23 20:18:26 -04:00
William Falcon c6244594a6
clear memory cache before train starts (#418)
* clear memory cache before train starts

* clear memory cache before train starts
2019-10-23 11:41:00 -04:00
William Falcon d955baa235
Update README.md 2019-10-23 06:13:31 -04:00
William Falcon b47b881f78
Update README.md 2019-10-23 06:13:00 -04:00
William Falcon 5afae59715
refactored tests (#417)
* refactored tests

* refactored tests

* refactored tests

* refactored tests

* refactored tests

* refactored tests

* refactored tests

* refactored tests

* refactored tests
2019-10-23 06:10:13 -04:00
Vismantas 2aba70e228 parse_gpu_ids fix (#382)
* Unit tests for num_gpu property as proxy for __parse_gpu_ids.

* Refactoring __parse_gpu_ids

* Moved the function outside the class as it is
an utility function and did not depend on class in any way.
* Added unit tests for it.

* Mocked torch.cuda.device_count function in tests.

This allows the tests to be run on machines that do not have gpus.

* Fixed the parse_gpu_ids function to handle -1 case.

Function now handles -1 the same way as it does for '-1'.

* Unit tests for root_gpu added.

Added backend as a parameter as currently depending on backend set
or not, code fails with exception in certain circumstances, before
giving a wrong answer.

* Moved __set_root_gpu function out of the class.

This function does not depend on the class and can be tested
more easily this way.
Also added unit tests for this function. They simply reuse
data for the root_gpu property.

* determine_root_gpu_device passes unit tests.

* num_gpus passes unit tests.

Also added a None test for this function.

* parse_gpu_ids tests changed to reflect desired state after refactoring.

Planning to refactor parse_gpu_ids to always return list of ints.
This will simplify code that use output of this function.

* * parse_gpu_ids always returns lists
* parse_gpu_ids checks given ids against available ids
* parse_gpu_ids raises exception for non existant ids
* parse_gpu_ids returns None when no gpus are available
* cleaned up determine_root_gpu_device
* cleaned up num_gpus property
* Updated unit tests to reflect changes in the functions

* Flake8 fixes

* Moved fixture code up before where it is used.

* Updated documentation.

* Changed tests to match the API:
* gpus=-1 or gpus='-1' should use all available gpu devices
* gpus=N
    * N=0: no gpus should be used.
    * N>0: N gpus should be used
* gpus=list of ints or a comma separated string of numbers:
    Use the gpus indicated by the list or the string.

* Fixed code to pass all the changed tests for parsing gpus param.

* Refactoring parse_gpu_ids function.

* flake8 fixes.

* Updating documentation.

* flake8 fixes.

* flake8 fixes.

* flake8 fixes

* Update trainer.py

* Update dp_mixin.py

* Make reduce_distributed_output a stand alone function.
Fix imports.
Fix flake8.

* Add comet_ml dependency to tests requirements.txt

* Revert "Make reduce_distributed_output a stand alone function. Fix imports. Fix flake8."

This reverts commit eac0338

* Merge with master.
2019-10-23 05:05:09 -04:00
Nic Eggert 05cea3ff8b Save / Load Hyperparameters with checkpoint (#415)
* Save and load hparams from checkpoints

* Update docs

* Add warning when not saving hparams

* Missing import

* Update .run_local_tests.sh

* Update lm_test_module_mixins.py

* Update lightning_module_template.py
2019-10-23 04:48:24 -04:00
Jirka Borovec f18aee30a5 Minor imports cleaning (#402)
* code cleaning

* drop unused imports

* optimize imports
2019-10-22 11:32:40 +03:00
William Falcon e6e325c853 added comet testing dep 2019-10-22 10:36:48 +03:00
William Falcon ad3c6acca3 flake8 2019-10-22 10:34:00 +03:00
William Falcon 1424157731
Refactor (#407)
* moved dp, ddp outside of trainer

* added main mixins

* finished major mixin refactor

* flake8

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor
2019-10-22 04:16:51 +03:00
William Falcon b0281395bf changes examples to pl_examples for name connflict 2019-10-19 00:41:17 +02:00
William Falcon 699bd2cb50
removed mlflow and custom logger tests (#389)
* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests
2019-10-18 23:03:28 +02:00
William Falcon e04dfb37fd changes to seed for tests 2019-10-18 15:54:11 +02:00
William Falcon c6dde49296 changed lbfgs test min acc 2019-10-18 09:51:33 +02:00
William Falcon d29a693590 changed lbfgs test 2019-10-18 02:15:04 +02:00
William Falcon 65a2cf6104 changed lbfgs test 2019-10-18 01:31:45 +02:00
William Falcon d8920169ac dp tests 2019-10-18 01:06:50 +02:00
William Falcon 2044126821
fixing tests (#372)
* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests
2019-10-16 07:28:47 -04:00
William Falcon e2cabb03ba
fix val logging (#362)
* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* no warnings always

* no warnings always

* no warnings always

* no warnings always
2019-10-15 12:44:20 -04:00
William Falcon a94e9d8e12
Update test_models.py 2019-10-10 15:17:19 -04:00
William Falcon 46322b906b
fixed ckpt tests (#352)
* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests
2019-10-10 15:16:19 -04:00
William Falcon ec10119e97
Fixed tests (#340)
* removed hparam calls

* removed hparam calls

* removed hparam calls

* removed hparam calls

* removed hparam calls

* Update test_models.py
2019-10-09 10:37:10 -04:00
Nic Eggert 8088052825 Finalize logger (#337)
* Ensure logger.finalize is called

* Call logger.finalize

* Update mlflow_logger.py

* Update test_logging.py

* Update trainer.py
2019-10-08 17:33:33 -04:00
William Falcon 49e04de5ac
Ports (#338)
* remove os.exit from early stopping

* remove os.exit from early stopping

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary
2019-10-08 17:11:47 -04:00
William Falcon ac6d0154c2
Fixes lack of logging in logger (#319)
* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* models wait to restore weights

* models wait to restore weights
2019-10-06 17:57:23 -04:00
William Falcon 491100abdd
Docs (#315)
* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up docs

* cleaned up test_tube logger

* cleaned up test_tube logger

* cleaned up test_tube logger
2019-10-05 23:52:32 -04:00
William Falcon 6cc3f1757f
decouple returns from each step (#307)
* decoupled training metrics from logging metrics

* decoupled validation metrics from log metrics

* updated docs

* updated docs

* updated docs

* Fixed test

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master
2019-10-05 13:35:20 -04:00
William Falcon 8f5a06bfb8
Gpu mem (#308)
* Fixes #289

* Fixes #289

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support (#310)

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #289

* Fixes #289

* merged master

* merged master
2019-10-05 11:29:34 -04:00
William Falcon 75fd89106f
added lbfgs support (#310)
* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support
2019-10-05 11:10:21 -04:00
William Falcon c9786cdef1
added test seeds (#306)
* added test seeds

* added test seeds

* updated docs
2019-10-05 10:56:52 -04:00
William Falcon 967957e55c added lbfgs support 2019-10-05 10:47:18 -04:00
William Falcon bf09060fef
Fixes #292 (#303)
* early stopping callback is not default

* added a default logger

* added default checkpoint callback

* added default checkpoint/loggers

* added default checkpoint/loggers

* updated docs

* cleaned demos

* cleaned demos

* cleaned demos

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers
2019-10-04 19:48:57 -04:00
William Falcon 033be9e9b4 tests fix 2019-10-04 17:32:52 -04:00
William Falcon 3a3ac73963 Merge branch 'master' of https://github.com/williamFalcon/pytorch-lightning 2019-10-04 16:56:05 -04:00
William Falcon cf07c153e9 tests fix 2019-10-04 16:55:51 -04:00
William Falcon a60a24d11b
disable auto gpu loading when restoring weights to avoid OOM (#242)
* Update root_module.py

* Update root_module.py

* Update root_module.py

* tests fix

* tests fix
2019-10-04 16:18:43 -04:00
William Falcon 73a7cf3c99
Mem crash (#299)
* fixes memory crash

* fixes memory crash
2019-10-04 15:53:44 -04:00
Hendrik Schröter 36f0b5bbd0 Use getter instead of python property for the dataloaders (#275)
* Use getter instead of python property for the dataloaders

* Fix lint

* Update trainer.py
2019-10-04 15:35:02 -04:00
William Falcon 32e74b8f36
Ddp2 (#261)
* adds ddp2 option where on each node a single  process  uses all gpus

* added ddp2  test

* added ddp2 docs

* Update Distributed training.md

* delete ref to old update_training_log_metrics

* delete ref to old update_training_log_metrics

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* cheesecake
2019-10-04 15:07:54 -04:00
Nic Eggert 614cb3c03b Initialize loggers only once (#270)
* Create underlying loggers lazily

This avoids creating duplicate experiments or run in multi-node DDP.

* Save hyperparameters automatically

* Update docs for snapshotting hyperparams

* Fix test tube

* Fix test tube pickling
2019-10-02 11:10:40 -04:00
Nic Eggert 480eed5cb6 Enable any ML experiment tracking framework (#223)
* Implement generic loggers for experiment tracking

* Add tests for loggers

* Get model tests passing

* Test and fix logger pickling

* Expand pickle test and fix bug

* Missed exp -> logger conversion

* Remove commented code

* Add docstrings

* Update logging docs

* Add mlflow to test requirements

* Make linter happy

* Fix mlflow timestamp

* Update Logging.md

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update properties.md

* Fix tests

* Line length
2019-09-27 12:05:29 -04:00
William Falcon 481aa24974
always calls the lr scheduler with epoch nb. Fixes #98 (#252)
* always calls the lr scheduler  with epoch nb

* added docs for cluster grid search

* added docs for cluster grid search

* undo test changes

* undo test changes
2019-09-26 16:36:41 -04:00
William Falcon cf04ff73e9 undo test changes 2019-09-26 16:10:51 -04:00
William Falcon de9fc0587b added docs for cluster grid search 2019-09-26 16:10:16 -04:00
William Falcon 25d2f93256
enables samplers which don't need set epoch (or when ppl don't need a sampler) (#254)
* enables samplers which dont need set epoch

* added docs for single gpu ddp

* added docs for single gpu ddp

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search
2019-09-26 14:39:04 -04:00
Alok Singh b0a0a47a0b Rename variables (#124)
-   data_batch → batch
-   batch_i → batch_idx
-   dataloader_i → dataloader_idx
-   tng → training
-   training_dataloader → train_dataloader
-   add_log_row_interval → row_log_interval
-   gradient_clip → gradient_clip_val
-   prog → progress
-   tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
William Falcon 55e7322747
Metrics load (#228)
* load from metrics defaults to CPU

* load from metrics defaults to CPU

* load from metrics defaults to CPU
2019-09-16 10:47:19 -04:00
William Falcon 9576dd28b2
added load on CPU first (#221)
* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added print logs

* added print logs

* changed close order

* changed close order
2019-09-11 07:52:36 -04:00
William Falcon 506d5da68b
enable single gpu per node (#218)
* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node
2019-09-09 07:37:20 -04:00
William Falcon 10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster (#213)
* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon 7099f8dbfb
split trainer mixins (#209)
* split trainer mixins

* Update multi_node_cluster_template.py

* Update single_cpu_template.py

* Update single_gpu_node_16bit_template.py

* Update single_gpu_node_ddp_template.py

* Update single_gpu_node_dp_template.py

* Update trainer_cpu_template.py

* Update trainer_io.py

* split trainer mixins

* Update multi_node_cluster_template.py

* deconflicted

* deconflicted

* deconflicted
2019-09-06 14:11:07 -04:00
William Falcon 60633eaa32
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler

* added restore weight functions

* set slurm signal handling inside process

* added resubmit docs

* added resubmit docs

* fixed missing param

* Update trainer.py

* fixed missing param

* fixed missing param

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests
2019-09-06 11:54:51 -04:00
Nic Eggert 1733dba735 Pass outputs from all dataloaders to test_end and validation_end (#203)
* Pass outputs from all dataloaders to test_end and validation_end

* Update tests

* Update docs

* Update trainer.py

* Update test_models.py
2019-09-06 07:37:25 -04:00
Nic Eggert 64688e1e15 Refactor test modules (#180)
* Expectopatronum implement #89 (#182)

* rename validate -> evaluate; implement test logic; allow multiple test_loaders

* add test_step and test_end to LightningModule

* add in_test_mode to pretraining to implement case 2 (test pretrained model)

* fix code style issues

* LightningTestModel: add optional second test set, implement test_step and test_end

* implemented test for multiple test_dataloaders; fixed typo

* add two test cases for #89

* add documentation for test_step, test_end; fix computation of loss in validation_step example

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Added proper dp ddp routing calls for test mode

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update override_data_parallel.py

* Update test_models.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update test_models.py

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* Update trainer.py

* Update override_data_parallel.py

* Update debug.py

* Update lm_test_module.py

* Update test_models.py

* release v0.4.8

* Update README.md

* add training loop docs

* testing loop docs

* testing loop docs

* Convert __dataloader to _dataloader

This will let inherited classes use it

* Factor common test model setup into base class

* Specialized test modules inherit from LightningTestModelBase

* Fix __is_overriden so that it works with more complicated inheritance

* Use mixins to add functionality to test models

* Fix test with no val_dataloader

* Remove unused imports

* Get rid of wild card import

* Update trainer.py

* Update lm_test_module.py
2019-09-02 15:46:16 -04:00
Verena Haunschmid 25d5b25792 Expectopatronum implement #89 (#182)
* rename validate -> evaluate; implement test logic; allow multiple test_loaders

* add test_step and test_end to LightningModule

* add in_test_mode to pretraining to implement case 2 (test pretrained model)

* fix code style issues

* LightningTestModel: add optional second test set, implement test_step and test_end

* implemented test for multiple test_dataloaders; fixed typo

* add two test cases for #89

* add documentation for test_step, test_end; fix computation of loss in validation_step example

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Added proper dp ddp routing calls for test mode

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update override_data_parallel.py

* Update test_models.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update test_models.py

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* Update trainer.py

* Update override_data_parallel.py

* Update debug.py

* Update lm_test_module.py

* Update test_models.py
2019-09-02 07:15:27 -04:00
Stanislav 73cf47112e Gradient accumulation callback (#150)
* Gradient accumulation callback

* little test case

* typo

* import fix

* method name fix

* fix epochs indexing from 1

* better code style

* code style fix v2 :/

* change interface

* fix Trainre new api in tests

* trainer api bug fix

* new raising error, new update method

* extentions tests

* a little better tests

* typo fix

* flack8 better

* using scheduler for int and dict

* typo

* firs epoch bug fix

* test update

* empty dict exception

* floats check

* codestyle fix

* grad counting test

* someday, i will install normal linter

* add more checks

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py
2019-08-30 10:56:14 -04:00
William Falcon 4104a0fc47
cleaned up progbar (#165)
* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* flake 8
2019-08-23 21:23:27 -04:00
eqs 4a0b56755c bug fix for #157 (#158)
* Separate condition list/tuple case into separated cases

* Add test for tuple of tensor list and list of tensor dict

* Update test_models.py
2019-08-21 10:22:51 -04:00
William Falcon a27fb5d54c
enhanced optimizer return options (#120)
* added smarter optimizer options

* added smarter optimizer options

* added smarter optimizer options tests

* added smarter optimizer options tests

* added smarter optimizer options tests

* added smarter optimizer options tests

* added smarter optimizer options tests

* added smarter optimizer options tests

* added smarter optimizer options tests

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive
2019-08-15 11:31:56 -04:00
William Falcon db9254acbe
enable recursive parsing for single gpu inputs (#121)
* added tests

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive

* added single gpu data transfer recursive
2019-08-15 09:39:09 -04:00
William Falcon 7f53e7bfb3
Val idx optional in validation_step (#108)
* made dataset_i only available with multiple datasets

* updated interface signature

* updated tests
2019-08-13 11:37:37 -04:00
Sidhanth Holalkere 511f7ecb9a Support for multiple val_dataloaders (#97)
* Added support for multiple validation dataloaders

* Fix typo in README.md

* Update trainer.py

* Add support for multiple dataloaders

* Rename dataloader_index to dataloader_i

* Added warning to check val_dataloaders

Added a warning to ensure that all val_dataloaders were DistributedSamplers if ddp is enabled

* Updated DistributedSampler warning

* Fixed typo

* Added multiple val_dataloaders

* Multiple val_dataloader test

* Update lightning_module_template.py

Added dataloader_i to validation_step parameters

* Update trainer.py

* Reverted template changes

* Create multi_val_module.py

* Update no_val_end_module.py

* New MultiValModel

* Rename MultiValModel to MultiValTestModel

* Revert to LightningTestModel

* Update test_models.py

* Update trainer.py

* Update test_models.py

* multiple val_dataloaders in test template

* Fixed flake8 warnings

* Update trainer.py

* Fix flake errors

* Fixed Flake8 errors

* Update lm_test_module.py

keep this test model with a single dataset for val

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update RequiredTrainerInterface.md

* Update RequiredTrainerInterface.md

* Update test_models.py

* Update trainer.py

dont need the else clause, val_dataloader is either a list or none because of get_dataloaders()

* Update trainer.py

fixed flake errors

* Update trainer.py
2019-08-12 15:23:11 -04:00
William Falcon e5805bf8ff
val and test are optional now (#95)
* made validation step optional

* added no val model

* val_step can be implemented but not validation_end

* added no val end model

* added tests

* added tests

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* updated docs

* updated docs

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* fix pep8
2019-08-11 10:01:57 -04:00
Nic Eggert 996b1f9a6d When running DDP without DistributedSampler, throw warning instead of exception (#91) 2019-08-10 15:58:12 -04:00
William Falcon 3d23a56ed2
make experiment param in trainer optional (#77)
* removed forced exp

* modified test to also run without exp
2019-08-08 10:59:16 -04:00
William Falcon aa7245d9db
Load fix (#74)
* skip weight load without callback

* added simple cpu test

* fixed pep
2019-08-08 06:00:04 -04:00
William Falcon c7e8436083 added single gpu train test 2019-08-07 13:40:51 -04:00
William Falcon 16e5093805 added test model to do also 2019-08-07 11:47:05 -04:00
williamFalcon 8a58c6f8f0 added badge 2019-08-07 08:02:03 -07:00
williamFalcon f32b9064ac added badge 2019-08-07 08:01:33 -07:00
William Falcon a2d6d514d5 cleaned up pep8 issues 2019-08-07 10:19:03 -04:00
William Falcon 549d0f66df
Merge pull request #52 from alok/ptl-pl
Rename `ptl` to `pl`
2019-08-07 09:09:15 -04:00
William Falcon 35f23bbc82
Merge pull request #55 from williamFalcon/continue
add training restore
2019-08-07 09:02:16 -04:00
Jiri BOROVEC 86a90bfefd update codecov 2019-08-07 14:32:32 +02:00
William Falcon 0b92fe6cea updated test 2019-08-07 08:07:59 -04:00
William Falcon 0527a1dad1 debug 2019-08-07 08:03:40 -04:00
William Falcon 2018380598 debug 2019-08-07 08:01:42 -04:00
William Falcon 1e17bf76aa debug 2019-08-07 08:01:33 -04:00
William Falcon cdbcbad352 added hook on_sanity_check_start 2019-08-07 07:51:55 -04:00
William Falcon 8e4fe2002b fixed restore location 2019-08-07 07:45:57 -04:00
William Falcon 9713c41bf4 removed bad hook call 2019-08-07 07:33:08 -04:00
William Falcon 2575b157a4 removed bad hook call 2019-08-07 07:32:33 -04:00
William Falcon 47a691f158 updated tests and docs 2019-08-07 07:09:37 -04:00
Alok Singh 8b9f021ee6 Rename `ptl` to `pl`
Closes #46.
2019-08-06 23:02:55 -07:00
Jiri BOROVEC d9bfe964f9 update by flake8 2019-08-06 22:45:46 +02:00
Jiri BOROVEC 4e0b9c50e7 add CircleCI 2019-08-06 22:45:46 +02:00
Jiri BOROVEC 632d07b490 fix prints for py3.5 2019-08-06 22:45:46 +02:00
Jiri BOROVEC c44966a8bf apply PEP8 2019-08-06 22:45:27 +02:00
Jiri BOROVEC 50cca25d6f add missing req. 2019-08-06 22:45:27 +02:00
Jiri BOROVEC 627ac0be32 fix tests req. 2019-08-06 22:44:13 +02:00
William Falcon a79de1ec8e
Update README.md 2019-08-06 06:57:31 -04:00
Jiri BOROVEC 469941a528 pkg relative imports
* split requirements.txt
* pytest verbose
2019-08-05 10:52:09 +02:00
Jiri BOROVEC 92f8c57ff5 cutout examples 2019-08-05 09:51:47 +02:00
William Falcon 598e1accb5 updated docs 2019-08-01 10:11:26 -04:00
williamFalcon b9e0d841dc fixed lr scheduler tests 2019-07-28 06:21:41 -07:00
William Falcon 587c195298 added clean slurm save load test 2019-07-26 23:04:41 -04:00
William Falcon 64586f271d added clean slurm save load test 2019-07-26 23:02:18 -04:00
William Falcon 53b781709e added clean slurm save load test 2019-07-26 22:57:49 -04:00
William Falcon f183ac2a1c added clean slurm save load test 2019-07-26 22:51:33 -04:00
William Falcon 61c82611eb added clean slurm save load test 2019-07-26 22:40:07 -04:00
William Falcon 3224365190 added clean slurm save load test 2019-07-26 22:39:44 -04:00
William Falcon 2a4081e537 added clean slurm save load test 2019-07-26 22:33:31 -04:00
William Falcon 8e3a0443c7 added clean slurm save load test 2019-07-26 22:33:00 -04:00
William Falcon b5419fcd8b added clean slurm save load test 2019-07-26 22:24:01 -04:00
William Falcon c61e13f0ff fixed hpc save, load. cleaned apu 2019-07-26 22:13:41 -04:00
William Falcon a6ae97ac09 fixed hpc save, load. cleaned apu 2019-07-26 22:13:06 -04:00
William Falcon 4148c36abd added model save load test 2019-07-26 21:55:01 -04:00
William Falcon 84edf35f33 added saving tests to cpu 2019-07-26 12:35:28 -04:00
William Falcon a374a7ea00 added saving tests to cpu 2019-07-26 12:33:35 -04:00
William Falcon fbc1bbd161 added saving tests to cpu 2019-07-26 12:31:26 -04:00
William Falcon 84f03a1335 added saving tests to cpu 2019-07-26 12:29:19 -04:00
William Falcon 1a835969a6 added saving tests to cpu 2019-07-26 12:14:58 -04:00
William Falcon 2ee8f157ce added checkpoint test on cpu 2019-07-26 11:51:25 -04:00
William Falcon 51a5cc36e3 added checkpoint test on cpu 2019-07-26 11:50:02 -04:00
William Falcon b0d38d532d updated docs 2019-07-25 12:01:52 -04:00
William Falcon 88ac4a0849 testing multiple calles 2019-07-25 11:19:58 -04:00
William Falcon 383746b87a testing multiple calles 2019-07-25 11:19:20 -04:00
William Falcon fffc09830f switched cpu amp order 2019-07-25 11:11:14 -04:00
William Falcon aadf8e16aa switched cpu amp order 2019-07-25 11:10:21 -04:00
William Falcon 735df77862 added init to test folder 2019-07-24 21:35:38 -04:00
William Falcon 9856520c0c added init to test folder 2019-07-24 21:32:31 -04:00
William Falcon a186cf12dc added instructions to test 2019-07-24 21:31:43 -04:00
William Falcon 104b4dc1ff removed deps 2019-07-24 21:28:34 -04:00
William Falcon 23e7521300 added dp reduce out test 2019-07-24 20:22:54 -04:00
William Falcon d6e7994922 added dp reduce out test 2019-07-24 20:21:57 -04:00
William Falcon 37a26741cc testing map location 2019-07-24 20:08:17 -04:00
William Falcon a4bb80b936 dp doesnt support amp with any setting 2019-07-24 19:43:38 -04:00
William Falcon 5a1b3d17d2 pt dpp some ignores 2019-07-24 19:39:18 -04:00
William Falcon a3ad0e0ac1 ignoring dist parallel forward 2019-07-24 19:23:11 -04:00
William Falcon 9be15aa29f added cpu + amp error 2019-07-24 19:17:08 -04:00
William Falcon a5756d91be added cpu + amp error 2019-07-24 19:12:03 -04:00
William Falcon fcda19aa25 added cpu + amp error 2019-07-24 19:07:53 -04:00
William Falcon efbd1a1c18 added cpu 16 bit 2019-07-24 19:05:46 -04:00
William Falcon ed9d977c4a added cpu 16 bit 2019-07-24 19:05:20 -04:00
William Falcon 65ce10c255 testing -1 gpu option 2019-07-24 19:02:19 -04:00
William Falcon f58c83b399 made root note address individually testable 2019-07-24 18:57:42 -04:00
William Falcon 750fefac0c made root note address individually testable 2019-07-24 18:55:38 -04:00
William Falcon 53a0b9f365 moved slurm flag resolution to init 2019-07-24 18:46:21 -04:00
William Falcon 18ce3e5a23 moved slurm flag resolution to init 2019-07-24 18:40:54 -04:00
William Falcon 982f0d4b3a running ddp tests 2019-07-24 18:33:54 -04:00
William Falcon 3451a62650 running ddp tests 2019-07-24 18:27:40 -04:00
William Falcon 1313a7f397 fixed correct module on hpc save 2019-07-24 18:22:49 -04:00
William Falcon 6e2bf991f0 fixed correct module on hpc save 2019-07-24 18:21:22 -04:00
William Falcon a0e2b5ee54 fixed correct module on hpc save 2019-07-24 18:20:56 -04:00
William Falcon 8f0d9af168 fixed correct module on hpc save 2019-07-24 18:18:58 -04:00
William Falcon 7fa759ffed fixed correct module on hpc save 2019-07-24 18:16:31 -04:00
William Falcon 3600535bc5 fixed correct module on hpc save 2019-07-24 18:16:22 -04:00
William Falcon d7be0aae1c fixed correct module on hpc save 2019-07-24 18:16:02 -04:00
William Falcon 7217ecdb18 fixed correct module on hpc save 2019-07-24 18:12:46 -04:00
William Falcon 2e0fde7da7 fixed correct module on hpc save 2019-07-24 18:11:29 -04:00
William Falcon 10330f1991 fixed correct module on hpc save 2019-07-24 18:10:30 -04:00
William Falcon 549a158ec0 fixed correct module on hpc save 2019-07-24 18:09:04 -04:00
William Falcon 97980355e3 testing hpc save load 2019-07-24 17:58:00 -04:00
William Falcon 17f56c83b5 testing hpc save load 2019-07-24 17:57:15 -04:00
William Falcon 8191f268ec test memory printing 2019-07-24 17:56:47 -04:00
William Falcon 436e929458 test memory printing 2019-07-24 17:47:51 -04:00
William Falcon 7f420c0cc2 test memory printing 2019-07-24 17:41:08 -04:00
William Falcon ffdf11b7ed test memory printing 2019-07-24 17:35:39 -04:00
William Falcon 66abd0d382 test memory printing 2019-07-24 17:31:56 -04:00
William Falcon 0aa91c7fdc added multiple outputs to LightningTestModel 2019-07-24 17:19:31 -04:00
William Falcon 9101a70024 refactor tests 2019-07-24 17:12:12 -04:00
William Falcon b30fbf80d0 added test for no dist sampler 2019-07-24 17:11:25 -04:00
William Falcon d1d33e8db6 added test for no dist sampler 2019-07-24 17:10:14 -04:00
William Falcon 164751c918 added test for no dist sampler 2019-07-24 17:09:14 -04:00
William Falcon 096132b389 added test for no dist sampler 2019-07-24 17:04:12 -04:00
William Falcon 9e5dd7a7ea added test for no dist sampler 2019-07-24 17:02:39 -04:00
William Falcon 1e0bae14da added test for no dist sampler 2019-07-24 17:01:25 -04:00
William Falcon 8064a77aa7 added test for no dist sampler 2019-07-24 16:57:21 -04:00
William Falcon 5c21683566 added model for tests 2019-07-24 16:45:59 -04:00
William Falcon 383b4cdac7 added sample input for summary 2019-07-24 16:35:32 -04:00
William Falcon 9b792bf4d4 removed dead code in model save 2019-07-24 15:48:41 -04:00
William Falcon d4d0f54a37 removed dead code in model save 2019-07-24 15:48:35 -04:00
William Falcon 79c0054c38 removed forkedpdb 2019-07-24 15:28:23 -04:00
William Falcon bc40be3490 removed old files 2019-07-24 15:23:52 -04:00
William Falcon 1f67fbdb80 removed old files 2019-07-24 15:23:38 -04:00
William Falcon b1e16c2e7b added auto port find 2019-07-24 15:11:29 -04:00
William Falcon b5c67d91e5 added auto port find 2019-07-24 15:00:14 -04:00
William Falcon 90ff418017 added auto port find 2019-07-24 14:59:51 -04:00
William Falcon 98be54de80 added auto port find 2019-07-24 14:59:40 -04:00
William Falcon e3f01388df added auto port find 2019-07-24 14:57:54 -04:00
William Falcon afa25a26d9 added auto port find 2019-07-24 14:57:17 -04:00
William Falcon 46886f0c3c added auto port find 2019-07-24 14:57:09 -04:00
William Falcon 9a3f373d16 added auto port find 2019-07-24 14:56:35 -04:00
William Falcon 8651173920 added auto port find 2019-07-24 14:55:26 -04:00
William Falcon e52190e22b added auto port find 2019-07-24 14:55:00 -04:00
William Falcon 0c239da17c added auto port find 2019-07-24 14:54:20 -04:00
William Falcon 01c0d9a2d4 added auto port find 2019-07-24 14:48:56 -04:00
William Falcon b20a122e9c fixed amp bug 2019-07-24 14:23:52 -04:00
William Falcon 9e187574de fixed amp bug 2019-07-24 14:17:36 -04:00
William Falcon 5fe833ae01 fixed amp bug 2019-07-24 14:16:05 -04:00
William Falcon ca1835e063 fixed amp bug 2019-07-24 14:14:36 -04:00
William Falcon 4d559d9e3b fixed amp bug 2019-07-24 14:12:41 -04:00
William Falcon 1b273a32ee fixed amp bug 2019-07-24 14:11:05 -04:00
William Falcon dd4f8899c8 refactored model tests 2019-07-24 14:06:35 -04:00
William Falcon c26d200c41 refactored model tests 2019-07-24 13:57:34 -04:00
William Falcon ef843d5f96 refactored model tests 2019-07-24 13:56:21 -04:00
William Falcon 8a43f4307e refactored model tests 2019-07-24 13:42:42 -04:00
William Falcon b90841dc3d refactored model tests 2019-07-24 13:41:28 -04:00
William Falcon 24ceafa05c refactored model tests 2019-07-24 12:14:26 -04:00