Commit Graph

1416 Commits

Author SHA1 Message Date
William Falcon 9576dd28b2
added load on CPU first (#221)
* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added print logs

* added print logs

* changed close order

* changed close order
2019-09-11 07:52:36 -04:00
William Falcon 90353ac54e changed examples scripts 2019-09-11 07:05:15 -04:00
William Falcon cf7dbf6d7c changed examples scripts 2019-09-11 07:03:31 -04:00
William Falcon 30b25c8146
Sai prasanna master (#219)
* Fix incorrect warning for DistributedSampler.

Check whether `dataloader.sampler` is an instance of DistributedSampler instead of checking the `dataloader`.

* Update trainer.py

* merged
2019-09-09 11:36:24 -04:00
William Falcon ac0111c196
Update multi_node_cluster_auto_slurm.py 2019-09-09 10:55:47 -04:00
William Falcon cbc619afa1
Update multi_node_own_slurm_script.py 2019-09-09 10:54:43 -04:00
William Falcon 3393086cb6
Update multi_node_cluster_auto_slurm.py 2019-09-09 10:53:47 -04:00
William Falcon 506d5da68b
enable single gpu per node (#218)
* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node
2019-09-09 07:37:20 -04:00
William Falcon a6fe6f0917
Update README.md 2019-09-08 18:21:05 -04:00
William Falcon 8f289f9fa8
Update README.md 2019-09-08 18:19:00 -04:00
William Falcon 6c947f4e0d
Update README.md 2019-09-08 18:18:21 -04:00
William Falcon 396047ffa0
Updated distributed Demos (#215)
* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* simple slurm example

* simple slurm example

* simple slurm example
2019-09-08 18:17:33 -04:00
William Falcon 83b756f77b
Update tox.ini 2019-09-08 15:46:30 -04:00
William Falcon 10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster (#213)
* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon b3434943c7
Update multi_node_cluster_template.py 2019-09-07 10:31:20 -04:00
Alok Singh 81df2259ef Make print_nan_grads print grad (#208)
This seems more useful for debugging.
2019-09-07 01:08:09 -04:00
williamFalcon 9f9d38673e fixed demo 2019-09-06 16:26:46 -07:00
William Falcon 0c7fbc7178
Weights path (#211)
* added docs. removed options. added weights_save option

* removed old restore

* cleaned up save path

* cleaned up save path

* flake8
2019-09-06 17:01:03 -04:00
William Falcon 3e74ea15d8
Fixes #120 (#210) 2019-09-06 14:27:24 -04:00
William Falcon 7099f8dbfb
split trainer mixins (#209)
* split trainer mixins

* Update multi_node_cluster_template.py

* Update single_cpu_template.py

* Update single_gpu_node_16bit_template.py

* Update single_gpu_node_ddp_template.py

* Update single_gpu_node_dp_template.py

* Update trainer_cpu_template.py

* Update trainer_io.py

* split trainer mixins

* Update multi_node_cluster_template.py

* deconflicted

* deconflicted

* deconflicted
2019-09-06 14:11:07 -04:00
William Falcon 60633eaa32
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler

* added restore weight functions

* set slurm signal handling inside process

* added resubmit docs

* added resubmit docs

* fixed missing param

* Update trainer.py

* fixed missing param

* fixed missing param

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests
2019-09-06 11:54:51 -04:00
Jirka Borovec 7ed928dfac add PR template (#204)
* add PR template

* Update PULL_REQUEST_TEMPLATE.md
2019-09-06 10:12:06 -04:00
Nic Eggert 1733dba735 Pass outputs from all dataloaders to test_end and validation_end (#203)
* Pass outputs from all dataloaders to test_end and validation_end

* Update tests

* Update docs

* Update trainer.py

* Update test_models.py
2019-09-06 07:37:25 -04:00
Jirka Borovec 447ed30716 extend pip install info (#194)
* extend pip install info

* Update README.md

* Update README.md
2019-09-06 07:30:51 -04:00
William Falcon 7e0ac3149c
refactored init (#206) 2019-09-06 00:29:38 -04:00
Thomas J Fan bd50d9a2b4 DOC Adds reference to test-tube (#205) 2019-09-05 21:13:49 -04:00
Jirka Borovec 5ef6fa5608 add osx to Travis (#202)
* add CI macOS

* add CI Windows

* update CI

* drop Win

* update CI

* update CI
2019-09-05 15:08:19 -04:00
Anton Konstantinov 34b824a9d3 Implement correct transfer to GPU for batches (#200) 2019-09-05 07:13:06 -04:00
Thomas J Fan 62252cee58 STY Minor flake8 fix (#197) 2019-09-04 17:46:56 -04:00
Max Horn dac41030d4 Allow to deactivate GPU memory logging in Trainer (#190)
* Allow to deactivate GPU memory logging in Trainer

Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU
memory utilization. On some servers logging the GPU memory usage can
significantly slow down training.

* Update Logging.md

* Update trainer.py
2019-09-04 10:43:46 -04:00
Verena Haunschmid 0872c32151 fix import in Tensorboard example (#193) 2019-09-04 10:20:59 -04:00
Thomas J Fan c766167773 DOC Minor import fix (#192) 2019-09-04 06:17:54 -04:00
Nic Eggert 64688e1e15 Refactor test modules (#180)
* Expectopatronum implement #89 (#182)

* rename validate -> evaluate; implement test logic; allow multiple test_loaders

* add test_step and test_end to LightningModule

* add in_test_mode to pretraining to implement case 2 (test pretrained model)

* fix code style issues

* LightningTestModel: add optional second test set, implement test_step and test_end

* implemented test for multiple test_dataloaders; fixed typo

* add two test cases for #89

* add documentation for test_step, test_end; fix computation of loss in validation_step example

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Added proper dp ddp routing calls for test mode

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update override_data_parallel.py

* Update test_models.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update test_models.py

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* Update trainer.py

* Update override_data_parallel.py

* Update debug.py

* Update lm_test_module.py

* Update test_models.py

* release v0.4.8

* Update README.md

* add training loop docs

* testing loop docs

* testing loop docs

* Convert __dataloader to _dataloader

This will let inherited classes use it

* Factor common test model setup into base class

* Specialized test modules inherit from LightningTestModelBase

* Fix __is_overriden so that it works with more complicated inheritance

* Use mixins to add functionality to test models

* Fix test with no val_dataloader

* Remove unused imports

* Get rid of wild card import

* Update trainer.py

* Update lm_test_module.py
2019-09-02 15:46:16 -04:00
William Falcon c4ce347f3e testing loop docs 2019-09-02 07:15:45 -04:00
William Falcon 8d6648e51d Update README.md 2019-09-02 07:15:45 -04:00
William Falcon 9e6ce3b0d6 testing loop docs 2019-09-02 07:15:45 -04:00
William Falcon a327596b79 add training loop docs 2019-09-02 07:15:45 -04:00
William Falcon 08a1ae8069 release v0.4.8 2019-09-02 07:15:45 -04:00
Verena Haunschmid 25d5b25792 Expectopatronum implement #89 (#182)
* rename validate -> evaluate; implement test logic; allow multiple test_loaders

* add test_step and test_end to LightningModule

* add in_test_mode to pretraining to implement case 2 (test pretrained model)

* fix code style issues

* LightningTestModel: add optional second test set, implement test_step and test_end

* implemented test for multiple test_dataloaders; fixed typo

* add two test cases for #89

* add documentation for test_step, test_end; fix computation of loss in validation_step example

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Added proper dp ddp routing calls for test mode

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update override_data_parallel.py

* Update test_models.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update test_models.py

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* Update trainer.py

* Update override_data_parallel.py

* Update debug.py

* Update lm_test_module.py

* Update test_models.py
2019-09-02 07:15:27 -04:00
Stanislav 73cf47112e Gradient accumulation callback (#150)
* Gradient accumulation callback

* little test case

* typo

* import fix

* method name fix

* fix epochs indexing from 1

* better code style

* code style fix v2 :/

* change interface

* fix Trainre new api in tests

* trainer api bug fix

* new raising error, new update method

* extentions tests

* a little better tests

* typo fix

* flack8 better

* using scheduler for int and dict

* typo

* firs epoch bug fix

* test update

* empty dict exception

* floats check

* codestyle fix

* grad counting test

* someday, i will install normal linter

* add more checks

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update test_models.py
2019-08-30 10:56:14 -04:00
Ir1dXD c2247350bb feat(val_sanity): enable skipping validation sanity (#176)
* feat(val_sanity): enable skipping validation sanity when self.nb_sanity_val_steps is 0

* docs: elaborate on skipping
2019-08-28 06:41:31 -04:00
William Falcon 67c314272b
Update setup.py (#174) 2019-08-27 18:07:33 -04:00
Ir1dXD da4c1e3409 docs: add repo_name in the upright corner (#171) 2019-08-27 16:46:18 -04:00
Jirka Borovec cd89b4ef43 move GH docs (#168) 2019-08-27 07:10:26 -04:00
Ir1dXD 6eb6daa278 enable highlight (#170) 2019-08-27 07:09:46 -04:00
William Falcon c24599f5e5 release v 2019-08-24 08:13:54 -04:00
Ryan McCormick b22e5918a9 fix python syntax in code blocks to be consistent (#166)
A couple code blocks used "{.python}" instead of just "python" for the syntax highlighting, which doesn't render properly in GitHub markdown.
2019-08-23 21:24:18 -04:00
William Falcon 4104a0fc47
cleaned up progbar (#165)
* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* flake 8
2019-08-23 21:23:27 -04:00
William Falcon 2ad9a9708b
Update README.md 2019-08-23 16:10:45 -04:00
William Falcon ecce22f4de
Update README.md 2019-08-23 16:10:24 -04:00