Commit Graph

124 Commits

Author SHA1 Message Date
Adrian Wälchli 6e3e740a7f Param printing (#336)
* print thousands as K, M, B, T, ...

* add option to print top-level modules only

* added doc string and added spacing

* do not print summary if neither "full" nor "top"

* updated docs showing summary print options

* fix line length for travis
2019-10-08 15:30:06 -04:00
William Falcon 07c5d22ae3
cleaning up demos (#313)
* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos
2019-10-05 16:39:05 -04:00
William Falcon 6cc3f1757f
decouple returns from each step (#307)
* decoupled training metrics from logging metrics

* decoupled validation metrics from log metrics

* updated docs

* updated docs

* updated docs

* Fixed test

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master
2019-10-05 13:35:20 -04:00
William Falcon 8f5a06bfb8
Gpu mem (#308)
* Fixes #289

* Fixes #289

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support (#310)

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #289

* Fixes #289

* merged master

* merged master
2019-10-05 11:29:34 -04:00
William Falcon 967957e55c added lbfgs support 2019-10-05 10:47:18 -04:00
William Falcon bf09060fef
Fixes #292 (#303)
* early stopping callback is not default

* added a default logger

* added default checkpoint callback

* added default checkpoint/loggers

* added default checkpoint/loggers

* updated docs

* cleaned demos

* cleaned demos

* cleaned demos

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers
2019-10-04 19:48:57 -04:00
William Falcon a578de511d
clean up docs around loggers (#304) 2019-10-04 18:53:38 -04:00
William Falcon a60a24d11b
disable auto gpu loading when restoring weights to avoid OOM (#242)
* Update root_module.py

* Update root_module.py

* Update root_module.py

* tests fix

* tests fix
2019-10-04 16:18:43 -04:00
William Falcon 73a7cf3c99
Mem crash (#299)
* fixes memory crash

* fixes memory crash
2019-10-04 15:53:44 -04:00
Hendrik Schröter 36f0b5bbd0 Use getter instead of python property for the dataloaders (#275)
* Use getter instead of python property for the dataloaders

* Fix lint

* Update trainer.py
2019-10-04 15:35:02 -04:00
William Falcon 32e74b8f36
Ddp2 (#261)
* adds ddp2 option where on each node a single  process  uses all gpus

* added ddp2  test

* added ddp2 docs

* Update Distributed training.md

* delete ref to old update_training_log_metrics

* delete ref to old update_training_log_metrics

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* cheesecake
2019-10-04 15:07:54 -04:00
Alok Singh b0a0a47a0b Rename variables (#124)
-   data_batch → batch
-   batch_i → batch_idx
-   dataloader_i → dataloader_idx
-   tng → training
-   training_dataloader → train_dataloader
-   add_log_row_interval → row_log_interval
-   gradient_clip → gradient_clip_val
-   prog → progress
-   tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
William Falcon 55e7322747
Metrics load (#228)
* load from metrics defaults to CPU

* load from metrics defaults to CPU

* load from metrics defaults to CPU
2019-09-16 10:47:19 -04:00
William Falcon 7099f8dbfb
split trainer mixins (#209)
* split trainer mixins

* Update multi_node_cluster_template.py

* Update single_cpu_template.py

* Update single_gpu_node_16bit_template.py

* Update single_gpu_node_ddp_template.py

* Update single_gpu_node_dp_template.py

* Update trainer_cpu_template.py

* Update trainer_io.py

* split trainer mixins

* Update multi_node_cluster_template.py

* deconflicted

* deconflicted

* deconflicted
2019-09-06 14:11:07 -04:00
William Falcon 60633eaa32
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler

* added restore weight functions

* set slurm signal handling inside process

* added resubmit docs

* added resubmit docs

* fixed missing param

* Update trainer.py

* fixed missing param

* fixed missing param

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests
2019-09-06 11:54:51 -04:00
Verena Haunschmid 25d5b25792 Expectopatronum implement #89 (#182)
* rename validate -> evaluate; implement test logic; allow multiple test_loaders

* add test_step and test_end to LightningModule

* add in_test_mode to pretraining to implement case 2 (test pretrained model)

* fix code style issues

* LightningTestModel: add optional second test set, implement test_step and test_end

* implemented test for multiple test_dataloaders; fixed typo

* add two test cases for #89

* add documentation for test_step, test_end; fix computation of loss in validation_step example

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Added proper dp ddp routing calls for test mode

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update override_data_parallel.py

* Update test_models.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update test_models.py

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* Update trainer.py

* Update override_data_parallel.py

* Update debug.py

* Update lm_test_module.py

* Update test_models.py
2019-09-02 07:15:27 -04:00
William Falcon 4104a0fc47
cleaned up progbar (#165)
* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* flake 8
2019-08-23 21:23:27 -04:00
Sebastian Præsius b31539f62e Guard against AttributeError in dataloaders. (#161)
A solution for https://github.com/williamFalcon/pytorch-lightning/issues/142.
Since hasattr "calls getattr(object, name) and to see whether it raises an AttributeError or not", I replaced it with a single call to getattr.
See also https://stackoverflow.com/questions/24971061/python-hasattr-vs-getattr
2019-08-23 08:21:39 -04:00
William Falcon 7f53e7bfb3
Val idx optional in validation_step (#108)
* made dataset_i only available with multiple datasets

* updated interface signature

* updated tests
2019-08-13 11:37:37 -04:00
William Falcon 905a2e5a12
allow user to control optimizer step for every optimizer
* added custom hook for user defined optimizer step

* refactored to allow multiple optimizers different training_step

* refactored to allow multiple optimizers different training_step

* refactored to allow multiple optimizers different training_step

* refactored to allow multiple optimizers different training_step

* refactored to allow multiple optimizers different training_step

* pep8
2019-08-13 09:32:45 -04:00
William Falcon e5805bf8ff
val and test are optional now (#95)
* made validation step optional

* added no val model

* val_step can be implemented but not validation_end

* added no val end model

* added tests

* added tests

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* remove class

* updated docs

* updated docs

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* updated test

* fix pep8
2019-08-11 10:01:57 -04:00
William Falcon 10e4b18452 made imports absolute 2019-08-07 10:14:59 -04:00
William Falcon 35f23bbc82
Merge pull request #55 from williamFalcon/continue
add training restore
2019-08-07 09:02:16 -04:00
William Falcon cdbcbad352 added hook on_sanity_check_start 2019-08-07 07:51:55 -04:00
William Falcon 5c398d7a4e removed bad hook call 2019-08-07 07:39:41 -04:00
William Falcon a931ded310 removed bad hook call 2019-08-07 07:35:02 -04:00
William Falcon 95ec072d1e removed bad hook call 2019-08-07 07:30:02 -04:00
William Falcon d3f19c8321 added auto restore 2019-08-07 06:55:05 -04:00
Jiri BOROVEC d9bfe964f9 update by flake8 2019-08-06 22:45:46 +02:00
Jiri BOROVEC 632d07b490 fix prints for py3.5 2019-08-06 22:45:46 +02:00
Jiri BOROVEC c44966a8bf apply PEP8 2019-08-06 22:45:27 +02:00
Jiri BOROVEC 469941a528 pkg relative imports
* split requirements.txt
* pytest verbose
2019-08-05 10:52:09 +02:00
William Falcon 019b4d16d0 formatting 2019-08-04 13:08:14 -05:00
William Falcon f2ef367f7d removing unused imports 2019-08-04 13:07:50 -05:00
William Falcon ef6d5a412c proc 0 only for save hpc. all procs for hpc load 2019-08-01 16:19:04 -04:00
williamFalcon 27660b8a96 running tests 2019-07-28 05:57:37 -07:00
williamFalcon 5db28899aa merged 2019-07-28 05:39:25 -07:00
William Falcon f5a01edfb8 added clean slurm save load test 2019-07-26 22:32:34 -04:00
William Falcon f1de62671d added clean slurm save load test 2019-07-26 22:32:27 -04:00
William Falcon 57edb08bd8 added clean slurm save load test 2019-07-26 22:28:09 -04:00
William Falcon ffa7a0dbab added clean slurm save load test 2019-07-26 22:26:55 -04:00
William Falcon 348223a702 fixed hpc save, load. cleaned apu 2019-07-26 22:09:35 -04:00
William Falcon 64de447545 fixed hpc save, load. cleaned apu 2019-07-26 22:07:02 -04:00
William Falcon 265411572f fixed hpc save, load. cleaned apu 2019-07-26 22:04:27 -04:00
William Falcon 4148c36abd added model save load test 2019-07-26 21:55:01 -04:00
William Falcon aacf1947ea auto state-dict and remove the way the model is loaded during hpc 2019-07-26 21:38:06 -04:00
William Falcon e2c7fa44b7 auto state-dict and remove the way the model is loaded during hpc 2019-07-26 21:37:06 -04:00
William Falcon 1a835969a6 added saving tests to cpu 2019-07-26 12:14:58 -04:00
Phuc Le 7d97e3e6e4 Support any lr_scheduler 2019-07-26 11:03:44 +07:00
William Falcon 7e728d97e7 removed save model logging 2019-07-25 14:36:22 -04:00