Commit Graph

72 Commits

Author SHA1 Message Date
Vadim Bereznyuk 446a1b5d45 Split progress bar (#449)
* Splitted progress bars

* Iterable dataset total batches fix

* Use dynamic ncols and use batch as units

* Count epochs from 1 in progress bar

* Fix for disabled progress bar

* Code simplifications
2019-11-03 05:42:53 -05:00
Tullie Murrell 248495b1d1 Add tbptt (#429)
* Add truncated bptt

* Fix rebase error

* AutoPep8

* Address comments, incl default bptt_split impl

* Add tbptt test

* Add default split for lists/tuples

* Add tbptt docs

* Fix trainer spacing

* Update RequiredTrainerInterface.md
2019-10-31 06:45:28 -04:00
Vadim Bereznyuk f79bdf2327 Set total number of batches in progress bar while testing (#425) 2019-10-30 12:14:28 -04:00
Vadim Bereznyuk 9f8ab7c29e Fixed total number of batches (#439)
* Fixed total number of batches

* Fixed flake8 warning

* Update train_loop_mixin.py

* Update train_loop_mixin.py
2019-10-30 12:13:40 -04:00
William Falcon 8347a6c87e
mem clear (#440)
* mem clear

* mem clear
2019-10-30 12:11:21 -04:00
William Falcon b86d223889
makes checkpoint process safe (#431) 2019-10-25 08:57:05 -04:00
William Falcon d5ca464cc6
Back hook (#424)
* Fixes #356

* Fixes #356

* Fixes #356

* Fixes #356

* Fixes #356

* Fixes #356
2019-10-24 07:56:56 -04:00
William Falcon a4b43ce095
Loaders (#422)
* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading
2019-10-24 06:43:35 -04:00
William Falcon 5db90e32eb
hpc restore takes priority over non hpc weights (#419)
* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights

* hpc restore takes priority over non hpc weights
2019-10-23 20:18:26 -04:00
William Falcon c6244594a6
clear memory cache before train starts (#418)
* clear memory cache before train starts

* clear memory cache before train starts
2019-10-23 11:41:00 -04:00
David Kossnick 56fa2075a5 Move `global_step` incrementing (#412)
* Move global_step incrementing to the end of a batch loop, per https://github.com/williamFalcon/pytorch-lightning/issues/411

* Move met_batch_limit condition to the end

* cleanup whitespace

* Update train_loop_mixin.py
2019-10-23 06:11:18 -04:00
Vismantas 2aba70e228 parse_gpu_ids fix (#382)
* Unit tests for num_gpu property as proxy for __parse_gpu_ids.

* Refactoring __parse_gpu_ids

* Moved the function outside the class as it is
an utility function and did not depend on class in any way.
* Added unit tests for it.

* Mocked torch.cuda.device_count function in tests.

This allows the tests to be run on machines that do not have gpus.

* Fixed the parse_gpu_ids function to handle -1 case.

Function now handles -1 the same way as it does for '-1'.

* Unit tests for root_gpu added.

Added backend as a parameter as currently depending on backend set
or not, code fails with exception in certain circumstances, before
giving a wrong answer.

* Moved __set_root_gpu function out of the class.

This function does not depend on the class and can be tested
more easily this way.
Also added unit tests for this function. They simply reuse
data for the root_gpu property.

* determine_root_gpu_device passes unit tests.

* num_gpus passes unit tests.

Also added a None test for this function.

* parse_gpu_ids tests changed to reflect desired state after refactoring.

Planning to refactor parse_gpu_ids to always return list of ints.
This will simplify code that use output of this function.

* * parse_gpu_ids always returns lists
* parse_gpu_ids checks given ids against available ids
* parse_gpu_ids raises exception for non existant ids
* parse_gpu_ids returns None when no gpus are available
* cleaned up determine_root_gpu_device
* cleaned up num_gpus property
* Updated unit tests to reflect changes in the functions

* Flake8 fixes

* Moved fixture code up before where it is used.

* Updated documentation.

* Changed tests to match the API:
* gpus=-1 or gpus='-1' should use all available gpu devices
* gpus=N
    * N=0: no gpus should be used.
    * N>0: N gpus should be used
* gpus=list of ints or a comma separated string of numbers:
    Use the gpus indicated by the list or the string.

* Fixed code to pass all the changed tests for parsing gpus param.

* Refactoring parse_gpu_ids function.

* flake8 fixes.

* Updating documentation.

* flake8 fixes.

* flake8 fixes.

* flake8 fixes

* Update trainer.py

* Update dp_mixin.py

* Make reduce_distributed_output a stand alone function.
Fix imports.
Fix flake8.

* Add comet_ml dependency to tests requirements.txt

* Revert "Make reduce_distributed_output a stand alone function. Fix imports. Fix flake8."

This reverts commit eac0338

* Merge with master.
2019-10-23 05:05:09 -04:00
Nic Eggert 05cea3ff8b Save / Load Hyperparameters with checkpoint (#415)
* Save and load hparams from checkpoints

* Update docs

* Add warning when not saving hparams

* Missing import

* Update .run_local_tests.sh

* Update lm_test_module_mixins.py

* Update lightning_module_template.py
2019-10-23 04:48:24 -04:00
Hata Ryosuke e7c12d936e fixed bag callback=False or None at trainer_io.py (#409) 2019-10-22 13:07:48 -04:00
Jirka Borovec f18aee30a5 Minor imports cleaning (#402)
* code cleaning

* drop unused imports

* optimize imports
2019-10-22 11:32:40 +03:00
William Falcon 792ad00ff9
Fixed val interval (#405)
* added fixed frequency val batch check

* added fixed frequency val batch check

* Finished IterableDataset support

* flake8

* flake8

* flake8
2019-10-22 05:10:00 +03:00
William Falcon 1424157731
Refactor (#407)
* moved dp, ddp outside of trainer

* added main mixins

* finished major mixin refactor

* flake8

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor
2019-10-22 04:16:51 +03:00
tamyiuchau 4103a5ca73 Provide backward compatibility for #124 (#400)
* Provide backward compatibility for e681253

* typo fix
2019-10-21 08:16:55 +02:00
William Falcon 6111edaf82
Test fx (#390)
* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx
2019-10-19 00:39:30 +02:00
William Falcon 699bd2cb50
removed mlflow and custom logger tests (#389)
* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests
2019-10-18 23:03:28 +02:00
William Falcon 3dfcef6994
Loss keys (#387)
* any key in logs or progress bar is a candidate for callback metric

* any key in logs or progress bar is a candidate for callback metric
2019-10-18 15:28:13 +02:00
Hiroyuki Vincent Yamazaki 0fac2d64cf Fix off-by-one epoch length (#377) 2019-10-18 10:18:05 +02:00
William Falcon e5050700ce docs 2019-10-18 00:17:27 +02:00
William Falcon 2044126821
fixing tests (#372)
* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests
2019-10-16 07:28:47 -04:00
William Falcon e2cabb03ba
fix val logging (#362)
* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* no warnings always

* no warnings always

* no warnings always

* no warnings always
2019-10-15 12:44:20 -04:00
Nic Eggert 19c2b8fc9e Allow disabling default logger, checkpoint_callback, and early_stop_callback (#360)
* Allow disabling logger, early stopping, and checkpoints

* Typo

* Get tests passing

* Update trainer.py
2019-10-12 06:00:24 -04:00
Yasser Souri 792ba59b78 Pad experiment version with zero for easier listing (#355) 2019-10-10 19:39:26 -04:00
William Falcon 426bb19846
Update trainer.py 2019-10-10 18:17:26 -04:00
William Falcon 46322b906b
fixed ckpt tests (#352)
* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests
2019-10-10 15:16:19 -04:00
William Falcon 96c2a2de50 fixes Flake8 2019-10-09 17:49:29 -04:00
William Falcon 453568179b
Logger default (#351)
* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* ckpt callback in pretrain routine so exp already has version

* ckpt callback in pretrain routine so exp already has version

* ckpt callback in pretrain routine so exp already has version
2019-10-09 17:46:27 -04:00
William Falcon d95e693598
Logger default (#350)
* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder
2019-10-09 16:25:04 -04:00
William Falcon 6e0a562ecb fixed callback metrics ddp bug 2019-10-09 12:53:33 -04:00
William Falcon 5f1f3f6acc removed pdb 2019-10-09 10:45:06 -04:00
William Falcon 608a90a490
fixes non python type callback metrics and fast_dev_run (#345)
* fixes non python type callback metrics

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run
2019-10-09 10:23:08 -04:00
Nic Eggert 8088052825 Finalize logger (#337)
* Ensure logger.finalize is called

* Call logger.finalize

* Update mlflow_logger.py

* Update test_logging.py

* Update trainer.py
2019-10-08 17:33:33 -04:00
William Falcon 49e04de5ac
Ports (#338)
* remove os.exit from early stopping

* remove os.exit from early stopping

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary
2019-10-08 17:11:47 -04:00
William Falcon dcaba55251
Early stopping (#332)
* callbacks use all other keys in return dict

* callbacks use all other keys in return dict

* callbacks use all other keys in return dict

* callbacks use all other keys in return dict

* remove os.exit from early stopping
2019-10-08 16:21:00 -04:00
Adrian Wälchli 6e3e740a7f Param printing (#336)
* print thousands as K, M, B, T, ...

* add option to print top-level modules only

* added doc string and added spacing

* do not print summary if neither "full" nor "top"

* updated docs showing summary print options

* fix line length for travis
2019-10-08 15:30:06 -04:00
William Falcon ff2a21a08a
default to O1 (#334) 2019-10-08 09:09:57 -04:00
Jon Tamir 1cf2e228ba fix CONTRIBUTING link and silence checkpoint callback message (#325) 2019-10-08 07:40:14 -04:00
William Falcon ac6d0154c2
Fixes lack of logging in logger (#319)
* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* changed rank 0

* models wait to restore weights

* models wait to restore weights
2019-10-06 17:57:23 -04:00
William Falcon 491100abdd
Docs (#315)
* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up docs

* cleaned up test_tube logger

* cleaned up test_tube logger

* cleaned up test_tube logger
2019-10-05 23:52:32 -04:00
William Falcon ef98931d18 flake8 2019-10-05 16:56:24 -04:00
William Falcon 07c5d22ae3
cleaning up demos (#313)
* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos
2019-10-05 16:39:05 -04:00
William Falcon cdfcb01073
Fixes #234 (#311)
* Fixes #234

* default logger version is now slurm job id

* default logger version is now slurm job id
2019-10-05 14:45:37 -04:00
William Falcon 6cc3f1757f
decouple returns from each step (#307)
* decoupled training metrics from logging metrics

* decoupled validation metrics from log metrics

* updated docs

* updated docs

* updated docs

* Fixed test

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master
2019-10-05 13:35:20 -04:00
William Falcon 8f5a06bfb8
Gpu mem (#308)
* Fixes #289

* Fixes #289

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support (#310)

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #289

* Fixes #289

* merged master

* merged master
2019-10-05 11:29:34 -04:00
William Falcon 75fd89106f
added lbfgs support (#310)
* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support
2019-10-05 11:10:21 -04:00
William Falcon 2ac9f1aea7
Fixes #280 (#309) 2019-10-05 10:55:50 -04:00