Commit Graph

182 Commits

Author SHA1 Message Date
William Falcon 4c7c933326 fix demo 2019-10-16 09:25:34 -04:00
William Falcon f14700a16a fix demo 2019-10-16 09:25:26 -04:00
William Falcon 5395383910 fix demo 2019-10-16 09:24:02 -04:00
Yiming Lin b8666bf354 fix domain_templates (#365) 2019-10-14 06:56:33 -04:00
William Falcon fbc1272796
Update lightning_module_template.py 2019-10-07 20:08:54 -04:00
William Falcon 46b55d9aaa
Update lightning_module_template.py 2019-10-07 17:23:25 -04:00
William Falcon 7288014e47
Update README.md 2019-10-05 17:37:17 -04:00
William Falcon 07c5d22ae3
cleaning up demos (#313)
* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos
2019-10-05 16:39:05 -04:00
William Falcon f7d762416c
cleaning up demos (#312)
* cleaning up demos

* Update job_submit.sh

* Update README.md
2019-10-05 14:48:22 -04:00
William Falcon ed86bf96c5 cleaned up demos 2019-10-05 14:30:12 -04:00
William Falcon 8c2adf6250 cleaned up demos 2019-10-05 14:28:08 -04:00
William Falcon e739c79819 cleaned up demos 2019-10-05 14:21:12 -04:00
William Falcon 94f89e8e10 cleaned up demos 2019-10-05 14:15:09 -04:00
William Falcon 4d3a8c25d2 cleaned up demos 2019-10-05 14:13:55 -04:00
William Falcon 9fc01e3fd3 cleaned up demos 2019-10-05 14:13:32 -04:00
William Falcon c86524b0cc
Update single_gpu_node_ddp_template.py 2019-10-05 13:55:05 -04:00
William Falcon 0e2b0e39b5
Update single_gpu_node_16bit_template.py 2019-10-05 13:54:07 -04:00
William Falcon d03d7a2440
Update single_cpu_template.py 2019-10-05 13:52:25 -04:00
William Falcon 6cc3f1757f
decouple returns from each step (#307)
* decoupled training metrics from logging metrics

* decoupled validation metrics from log metrics

* updated docs

* updated docs

* updated docs

* Fixed test

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master
2019-10-05 13:35:20 -04:00
William Falcon bf09060fef
Fixes #292 (#303)
* early stopping callback is not default

* added a default logger

* added default checkpoint callback

* added default checkpoint/loggers

* added default checkpoint/loggers

* updated docs

* cleaned demos

* cleaned demos

* cleaned demos

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers
2019-10-04 19:48:57 -04:00
William Falcon a578de511d
clean up docs around loggers (#304) 2019-10-04 18:53:38 -04:00
William Falcon 32e74b8f36
Ddp2 (#261)
* adds ddp2 option where on each node a single  process  uses all gpus

* added ddp2  test

* added ddp2 docs

* Update Distributed training.md

* delete ref to old update_training_log_metrics

* delete ref to old update_training_log_metrics

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* cheesecake
2019-10-04 15:07:54 -04:00
William Falcon 2d335c664c
Update multi_node_cluster_auto_slurm.py 2019-10-04 13:07:22 -04:00
William Falcon 5fdfad5766
Update multi_node_cluster_auto_slurm.py 2019-10-04 13:05:52 -04:00
Alok Singh b0a0a47a0b Rename variables (#124)
-   data_batch → batch
-   batch_i → batch_idx
-   dataloader_i → dataloader_idx
-   tng → training
-   training_dataloader → train_dataloader
-   add_log_row_interval → row_log_interval
-   gradient_clip → gradient_clip_val
-   prog → progress
-   tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
Oscar A. Rangel eb268c4184 Added missing parameters (#237)
* Added missing parameters

added missing distributed_backend parameter and added the parameter to step 4 Init Trainer.

* Update single_gpu_node_dp_template.py
2019-09-21 09:45:12 -04:00
Oscar A. Rangel 6803018a49 changed hard coded paramater, and moved it to parent_parser (#238)
* changed hard coded paramater, and moved it to parent_parser

```python

    # ------------------------
    # 4 INIT TRAINER
    # ------------------------
    trainer = Trainer(
        experiment=exp,
        checkpoint_callback=checkpoint,
        early_stop_callback=early_stop,
        gpus=hparams.gpus,
        distributed_backend=hparams.dist_bak_end
    )


    parent_parser.add_argument('--dist_bak_end', type=str, default='ddp',
                                help='When using multiple GPUs set Trainer(distributed_backend=dp) (or ddp)')  
```

* Update single_gpu_node_ddp_template.py
2019-09-21 09:44:08 -04:00
William Falcon e339799a0a
Update README.md 2019-09-14 09:55:42 -04:00
William Falcon 50f5e4bec8
Update single_cpu_template.py 2019-09-14 02:23:49 -04:00
William Falcon f3221a5014
Update multi_node_cluster_auto_slurm.py 2019-09-14 02:14:08 -04:00
William Falcon fe17d14ade
Update multi_node_cluster_auto_slurm.py 2019-09-13 17:05:49 -04:00
William Falcon 90353ac54e changed examples scripts 2019-09-11 07:05:15 -04:00
William Falcon cf7dbf6d7c changed examples scripts 2019-09-11 07:03:31 -04:00
William Falcon ac0111c196
Update multi_node_cluster_auto_slurm.py 2019-09-09 10:55:47 -04:00
William Falcon cbc619afa1
Update multi_node_own_slurm_script.py 2019-09-09 10:54:43 -04:00
William Falcon 3393086cb6
Update multi_node_cluster_auto_slurm.py 2019-09-09 10:53:47 -04:00
William Falcon 8f289f9fa8
Update README.md 2019-09-08 18:19:00 -04:00
William Falcon 6c947f4e0d
Update README.md 2019-09-08 18:18:21 -04:00
William Falcon 396047ffa0
Updated distributed Demos (#215)
* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* simple slurm example

* simple slurm example

* simple slurm example
2019-09-08 18:17:33 -04:00
William Falcon b3434943c7
Update multi_node_cluster_template.py 2019-09-07 10:31:20 -04:00
williamFalcon 9f9d38673e fixed demo 2019-09-06 16:26:46 -07:00
William Falcon 7099f8dbfb
split trainer mixins (#209)
* split trainer mixins

* Update multi_node_cluster_template.py

* Update single_cpu_template.py

* Update single_gpu_node_16bit_template.py

* Update single_gpu_node_ddp_template.py

* Update single_gpu_node_dp_template.py

* Update trainer_cpu_template.py

* Update trainer_io.py

* split trainer mixins

* Update multi_node_cluster_template.py

* deconflicted

* deconflicted

* deconflicted
2019-09-06 14:11:07 -04:00
William Falcon 4104a0fc47
cleaned up progbar (#165)
* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* flake 8
2019-08-23 21:23:27 -04:00
William Falcon c30f69f60d
Update lightning_module_template.py 2019-08-23 02:42:40 -04:00
William Falcon d5d47eab0d
Update lightning_module_template.py 2019-08-23 02:39:05 -04:00
William Falcon 9aa9a1a796
Update lightning_module_template.py 2019-08-17 11:11:07 -04:00
William Falcon 1a31782272 fixed str crash err 2019-08-17 10:20:58 -04:00
William Falcon 590282f2b0
Update gan.py 2019-08-14 09:29:02 -04:00
William Falcon c9117f74b2
Update gan.py 2019-08-14 08:43:50 -04:00
William Falcon 13f2d1ab1c
Update gan.py 2019-08-14 08:41:32 -04:00
William Falcon 0d5da5f29b
added gan template (#115)
* added gan template

* ommit templates folder
2019-08-14 08:38:49 -04:00
William Falcon 7f53e7bfb3
Val idx optional in validation_step (#108)
* made dataset_i only available with multiple datasets

* updated interface signature

* updated tests
2019-08-13 11:37:37 -04:00
Sidhanth Holalkere 511f7ecb9a Support for multiple val_dataloaders (#97)
* Added support for multiple validation dataloaders

* Fix typo in README.md

* Update trainer.py

* Add support for multiple dataloaders

* Rename dataloader_index to dataloader_i

* Added warning to check val_dataloaders

Added a warning to ensure that all val_dataloaders were DistributedSamplers if ddp is enabled

* Updated DistributedSampler warning

* Fixed typo

* Added multiple val_dataloaders

* Multiple val_dataloader test

* Update lightning_module_template.py

Added dataloader_i to validation_step parameters

* Update trainer.py

* Reverted template changes

* Create multi_val_module.py

* Update no_val_end_module.py

* New MultiValModel

* Rename MultiValModel to MultiValTestModel

* Revert to LightningTestModel

* Update test_models.py

* Update trainer.py

* Update test_models.py

* multiple val_dataloaders in test template

* Fixed flake8 warnings

* Update trainer.py

* Fix flake errors

* Fixed Flake8 errors

* Update lm_test_module.py

keep this test model with a single dataset for val

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update RequiredTrainerInterface.md

* Update RequiredTrainerInterface.md

* Update test_models.py

* Update trainer.py

dont need the else clause, val_dataloader is either a list or none because of get_dataloaders()

* Update trainer.py

fixed flake errors

* Update trainer.py
2019-08-12 15:23:11 -04:00
William Falcon 8cd764a151
removed reduce on non-loss outputs from dp (#78)
* removed reduce on non-loss outputs from dp

* fixed val reduce

* fixed val reduce

* fixed val reduce

* fixed val reduce
2019-08-08 12:06:29 -04:00
William Falcon 549d0f66df
Merge pull request #52 from alok/ptl-pl
Rename `ptl` to `pl`
2019-08-07 09:09:15 -04:00
Alok Singh 8b9f021ee6 Rename `ptl` to `pl`
Closes #46.
2019-08-06 23:02:55 -07:00
Jiri BOROVEC f8a79b3082 fix imports in examples 2019-08-06 22:45:46 +02:00
Jiri BOROVEC d9bfe964f9 update by flake8 2019-08-06 22:45:46 +02:00
Jiri BOROVEC 632d07b490 fix prints for py3.5 2019-08-06 22:45:46 +02:00
Jiri BOROVEC c44966a8bf apply PEP8 2019-08-06 22:45:27 +02:00
Jiri BOROVEC 469941a528 pkg relative imports
* split requirements.txt
* pytest verbose
2019-08-05 10:52:09 +02:00
Jiri BOROVEC 92f8c57ff5 cutout examples 2019-08-05 09:51:47 +02:00
William Falcon 1eda58fa93 adding tests 2019-07-24 07:19:50 -04:00
William Falcon d7409afed9 added arg docs 2019-07-18 12:11:59 -04:00
William Falcon f01cb63234 added arg docs 2019-07-18 12:10:07 -04:00
William Falcon 8be7480f31 added arg docs 2019-07-18 12:09:25 -04:00
William Falcon 2ca0864ce8 added arg docs 2019-07-18 12:07:11 -04:00
William Falcon b1041220ac added arg docs 2019-07-18 12:05:52 -04:00
William Falcon da842c0cd6 added arg docs 2019-07-18 12:04:45 -04:00
William Falcon e81dbce38c set dp as default backend 2019-07-18 11:59:14 -04:00
William Falcon 0d992689d5 set dp as default backend 2019-07-18 11:58:27 -04:00
William Falcon b684bb55c5 set dp as default backend 2019-07-18 11:56:48 -04:00
William Falcon f98f88ff08 set dp as default backend 2019-07-18 11:51:43 -04:00
William Falcon bc3a805202 set dp as default backend 2019-07-18 11:16:16 -04:00
William Falcon 9ee8f93483 scaled batch size 2019-07-08 20:05:45 -04:00
William Falcon 7285598e11 scaled batch size 2019-07-08 19:57:51 -04:00
William Falcon 9d35b5b4f7 scaled batch size 2019-07-08 19:57:06 -04:00
William Falcon 2b16c75499 scaled batch size 2019-07-08 19:56:52 -04:00
William Falcon 0bd9152e0a scaled batch size 2019-07-08 19:55:26 -04:00
William Falcon a87073bffd scaled batch size 2019-07-08 19:54:00 -04:00
William Falcon f95fad864d scaled batch size 2019-07-08 19:53:24 -04:00
William Falcon f2c1f0221e scaled batch size 2019-07-08 19:48:22 -04:00
William Falcon 25dbd7a936 scaled batch size 2019-07-08 19:45:52 -04:00
William Falcon 971a6c4184 scaled batch size 2019-07-08 19:44:23 -04:00
William Falcon f95cc6144c scaled batch size 2019-07-08 19:42:53 -04:00
William Falcon 96314cbf46 updated dist sampler 2019-07-08 19:26:51 -04:00
William Falcon cf4b25e455 moved sampler 2019-07-08 18:59:16 -04:00
William Falcon 3873850ad4 moved sampler 2019-07-08 18:33:29 -04:00
William Falcon 85dd78f3a4 moved sampler 2019-07-08 18:32:28 -04:00
William Falcon 493a98d591 moved sampler 2019-07-08 18:28:30 -04:00
William Falcon bd2d1ddc07 moved sampler 2019-07-08 18:02:41 -04:00
William Falcon c494e6d305 added cpu example 2019-07-08 17:45:09 -04:00
William Falcon 7a354668ff added cpu example 2019-07-08 17:42:33 -04:00
William Falcon bd43c4417f added single node example 2019-07-08 17:29:46 -04:00
William Falcon 64bdd1c46d cleaning up demo file 2019-07-08 14:31:40 -04:00
William Falcon 5c56295421 updated demo name 2019-07-08 14:29:03 -04:00
William Falcon bba51dde8c updated parser help 2019-07-08 14:27:19 -04:00
William Falcon d2a717d31e using slurm flag to fine node nb 2019-07-08 14:14:36 -04:00
William Falcon 553223334f using slurm flag to fine node nb 2019-07-08 14:11:48 -04:00
William Falcon 52a3c3137a using slurm flag to fine node nb 2019-07-08 13:48:59 -04:00