William Falcon
9576dd28b2
added load on CPU first ( #221 )
...
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added load on CPU first
* added print logs
* added print logs
* changed close order
* changed close order
2019-09-11 07:52:36 -04:00
William Falcon
90353ac54e
changed examples scripts
2019-09-11 07:05:15 -04:00
William Falcon
cf7dbf6d7c
changed examples scripts
2019-09-11 07:03:31 -04:00
William Falcon
30b25c8146
Sai prasanna master ( #219 )
...
* Fix incorrect warning for DistributedSampler.
Check whether `dataloader.sampler` is an instance of DistributedSampler instead of checking the `dataloader`.
* Update trainer.py
* merged
2019-09-09 11:36:24 -04:00
William Falcon
ac0111c196
Update multi_node_cluster_auto_slurm.py
2019-09-09 10:55:47 -04:00
William Falcon
cbc619afa1
Update multi_node_own_slurm_script.py
2019-09-09 10:54:43 -04:00
William Falcon
3393086cb6
Update multi_node_cluster_auto_slurm.py
2019-09-09 10:53:47 -04:00
William Falcon
506d5da68b
enable single gpu per node ( #218 )
...
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
* enable single gpu per node
2019-09-09 07:37:20 -04:00
William Falcon
a6fe6f0917
Update README.md
2019-09-08 18:21:05 -04:00
William Falcon
8f289f9fa8
Update README.md
2019-09-08 18:19:00 -04:00
William Falcon
6c947f4e0d
Update README.md
2019-09-08 18:18:21 -04:00
William Falcon
396047ffa0
Updated distributed Demos ( #215 )
...
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* added simple cluster template
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* simple slurm example
* simple slurm example
* simple slurm example
2019-09-08 18:17:33 -04:00
William Falcon
83b756f77b
Update tox.ini
2019-09-08 15:46:30 -04:00
William Falcon
10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster ( #213 )
...
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added simple cluster template
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon
b3434943c7
Update multi_node_cluster_template.py
2019-09-07 10:31:20 -04:00
Alok Singh
81df2259ef
Make print_nan_grads print grad ( #208 )
...
This seems more useful for debugging.
2019-09-07 01:08:09 -04:00
williamFalcon
9f9d38673e
fixed demo
2019-09-06 16:26:46 -07:00
William Falcon
0c7fbc7178
Weights path ( #211 )
...
* added docs. removed options. added weights_save option
* removed old restore
* cleaned up save path
* cleaned up save path
* flake8
2019-09-06 17:01:03 -04:00
William Falcon
3e74ea15d8
Fixes #120 ( #210 )
2019-09-06 14:27:24 -04:00
William Falcon
7099f8dbfb
split trainer mixins ( #209 )
...
* split trainer mixins
* Update multi_node_cluster_template.py
* Update single_cpu_template.py
* Update single_gpu_node_16bit_template.py
* Update single_gpu_node_ddp_template.py
* Update single_gpu_node_dp_template.py
* Update trainer_cpu_template.py
* Update trainer_io.py
* split trainer mixins
* Update multi_node_cluster_template.py
* deconflicted
* deconflicted
* deconflicted
2019-09-06 14:11:07 -04:00
William Falcon
60633eaa32
Moves hpc auto-resubmit to trainer from test-tube ( #207 )
...
* added slurm signal handler
* added restore weight functions
* set slurm signal handling inside process
* added resubmit docs
* added resubmit docs
* fixed missing param
* Update trainer.py
* fixed missing param
* fixed missing param
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
2019-09-06 11:54:51 -04:00
Jirka Borovec
7ed928dfac
add PR template ( #204 )
...
* add PR template
* Update PULL_REQUEST_TEMPLATE.md
2019-09-06 10:12:06 -04:00
Nic Eggert
1733dba735
Pass outputs from all dataloaders to test_end and validation_end ( #203 )
...
* Pass outputs from all dataloaders to test_end and validation_end
* Update tests
* Update docs
* Update trainer.py
* Update test_models.py
2019-09-06 07:37:25 -04:00
Jirka Borovec
447ed30716
extend pip install info ( #194 )
...
* extend pip install info
* Update README.md
* Update README.md
2019-09-06 07:30:51 -04:00
William Falcon
7e0ac3149c
refactored init ( #206 )
2019-09-06 00:29:38 -04:00
Thomas J Fan
bd50d9a2b4
DOC Adds reference to test-tube ( #205 )
2019-09-05 21:13:49 -04:00
Jirka Borovec
5ef6fa5608
add osx to Travis ( #202 )
...
* add CI macOS
* add CI Windows
* update CI
* drop Win
* update CI
* update CI
2019-09-05 15:08:19 -04:00
Anton Konstantinov
34b824a9d3
Implement correct transfer to GPU for batches ( #200 )
2019-09-05 07:13:06 -04:00
Thomas J Fan
62252cee58
STY Minor flake8 fix ( #197 )
2019-09-04 17:46:56 -04:00
Max Horn
dac41030d4
Allow to deactivate GPU memory logging in Trainer ( #190 )
...
* Allow to deactivate GPU memory logging in Trainer
Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU
memory utilization. On some servers logging the GPU memory usage can
significantly slow down training.
* Update Logging.md
* Update trainer.py
2019-09-04 10:43:46 -04:00
Verena Haunschmid
0872c32151
fix import in Tensorboard example ( #193 )
2019-09-04 10:20:59 -04:00
Thomas J Fan
c766167773
DOC Minor import fix ( #192 )
2019-09-04 06:17:54 -04:00
Nic Eggert
64688e1e15
Refactor test modules ( #180 )
...
* Expectopatronum implement #89 (#182 )
* rename validate -> evaluate; implement test logic; allow multiple test_loaders
* add test_step and test_end to LightningModule
* add in_test_mode to pretraining to implement case 2 (test pretrained model)
* fix code style issues
* LightningTestModel: add optional second test set, implement test_step and test_end
* implemented test for multiple test_dataloaders; fixed typo
* add two test cases for #89
* add documentation for test_step, test_end; fix computation of loss in validation_step example
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Added proper dp ddp routing calls for test mode
* Update trainer.py
* Update test_models.py
* Update trainer.py
* Update trainer.py
* Update override_data_parallel.py
* Update test_models.py
* Update test_models.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update test_models.py
* Update test_models.py
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* Update trainer.py
* Update override_data_parallel.py
* Update debug.py
* Update lm_test_module.py
* Update test_models.py
* release v0.4.8
* Update README.md
* add training loop docs
* testing loop docs
* testing loop docs
* Convert __dataloader to _dataloader
This will let inherited classes use it
* Factor common test model setup into base class
* Specialized test modules inherit from LightningTestModelBase
* Fix __is_overriden so that it works with more complicated inheritance
* Use mixins to add functionality to test models
* Fix test with no val_dataloader
* Remove unused imports
* Get rid of wild card import
* Update trainer.py
* Update lm_test_module.py
2019-09-02 15:46:16 -04:00
William Falcon
c4ce347f3e
testing loop docs
2019-09-02 07:15:45 -04:00
William Falcon
8d6648e51d
Update README.md
2019-09-02 07:15:45 -04:00
William Falcon
9e6ce3b0d6
testing loop docs
2019-09-02 07:15:45 -04:00
William Falcon
a327596b79
add training loop docs
2019-09-02 07:15:45 -04:00
William Falcon
08a1ae8069
release v0.4.8
2019-09-02 07:15:45 -04:00
Verena Haunschmid
25d5b25792
Expectopatronum implement #89 ( #182 )
...
* rename validate -> evaluate; implement test logic; allow multiple test_loaders
* add test_step and test_end to LightningModule
* add in_test_mode to pretraining to implement case 2 (test pretrained model)
* fix code style issues
* LightningTestModel: add optional second test set, implement test_step and test_end
* implemented test for multiple test_dataloaders; fixed typo
* add two test cases for #89
* add documentation for test_step, test_end; fix computation of loss in validation_step example
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Added proper dp ddp routing calls for test mode
* Update trainer.py
* Update test_models.py
* Update trainer.py
* Update trainer.py
* Update override_data_parallel.py
* Update test_models.py
* Update test_models.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update test_models.py
* Update test_models.py
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* Update trainer.py
* Update override_data_parallel.py
* Update debug.py
* Update lm_test_module.py
* Update test_models.py
2019-09-02 07:15:27 -04:00
Stanislav
73cf47112e
Gradient accumulation callback ( #150 )
...
* Gradient accumulation callback
* little test case
* typo
* import fix
* method name fix
* fix epochs indexing from 1
* better code style
* code style fix v2 :/
* change interface
* fix Trainre new api in tests
* trainer api bug fix
* new raising error, new update method
* extentions tests
* a little better tests
* typo fix
* flack8 better
* using scheduler for int and dict
* typo
* firs epoch bug fix
* test update
* empty dict exception
* floats check
* codestyle fix
* grad counting test
* someday, i will install normal linter
* add more checks
* Update test_models.py
* Update test_models.py
* Update test_models.py
* Update test_models.py
* Update test_models.py
* Update test_models.py
* Update test_models.py
2019-08-30 10:56:14 -04:00
Ir1dXD
c2247350bb
feat(val_sanity): enable skipping validation sanity ( #176 )
...
* feat(val_sanity): enable skipping validation sanity when self.nb_sanity_val_steps is 0
* docs: elaborate on skipping
2019-08-28 06:41:31 -04:00
William Falcon
67c314272b
Update setup.py ( #174 )
2019-08-27 18:07:33 -04:00
Ir1dXD
da4c1e3409
docs: add repo_name in the upright corner ( #171 )
2019-08-27 16:46:18 -04:00
Jirka Borovec
cd89b4ef43
move GH docs ( #168 )
2019-08-27 07:10:26 -04:00
Ir1dXD
6eb6daa278
enable highlight ( #170 )
2019-08-27 07:09:46 -04:00
William Falcon
c24599f5e5
release v
2019-08-24 08:13:54 -04:00
Ryan McCormick
b22e5918a9
fix python syntax in code blocks to be consistent ( #166 )
...
A couple code blocks used "{.python}" instead of just "python" for the syntax highlighting, which doesn't render properly in GitHub markdown.
2019-08-23 21:24:18 -04:00
William Falcon
4104a0fc47
cleaned up progbar ( #165 )
...
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* flake 8
2019-08-23 21:23:27 -04:00
William Falcon
2ad9a9708b
Update README.md
2019-08-23 16:10:45 -04:00
William Falcon
ecce22f4de
Update README.md
2019-08-23 16:10:24 -04:00