Commit Graph

1500 Commits

Author SHA1 Message Date
William Falcon 481aa24974
always calls the lr scheduler with epoch nb. Fixes #98 (#252)
* always calls the lr scheduler  with epoch nb

* added docs for cluster grid search

* added docs for cluster grid search

* undo test changes

* undo test changes
2019-09-26 16:36:41 -04:00
William Falcon cf04ff73e9 undo test changes 2019-09-26 16:10:51 -04:00
William Falcon de9fc0587b added docs for cluster grid search 2019-09-26 16:10:16 -04:00
William Falcon 059b2fae29
Update Distributed training.md 2019-09-26 15:30:54 -04:00
William Falcon cefcf4cd12
Update Distributed training.md 2019-09-26 15:27:34 -04:00
Adrian Wälchli e713e2e1e0 fix typo in early stopping (#260) 2019-09-26 15:04:57 -04:00
William Falcon 25d2f93256
enables samplers which don't need set epoch (or when ppl don't need a sampler) (#254)
* enables samplers which dont need set epoch

* added docs for single gpu ddp

* added docs for single gpu ddp

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search

* added docs for cluster grid search
2019-09-26 14:39:04 -04:00
William Falcon 8b2a2aeda3
Dim 0 warning (#256)
* added ignore warnings module

* added ignore warnings module

* Fixes #249

* Update ignored_warnings.py
2019-09-26 13:20:54 -04:00
William Falcon acb4ebea56 added docs for cluster grid search 2019-09-26 12:02:03 -04:00
William Falcon 3cab3b2f8c
Update README.md 2019-09-26 10:45:08 -04:00
William Falcon 5a9320d822 Merge branch 'master' of https://github.com/williamFalcon/pytorch-lightning 2019-09-26 10:42:38 -04:00
William Falcon c2a0846011 release v0.5.0 2019-09-26 10:42:24 -04:00
William Falcon 97b6ebccc0
expanded apex install (#255) 2019-09-26 09:36:03 -04:00
William Falcon 3337c0237b
Fixes #250 (#253) 2019-09-26 09:13:00 -04:00
Alok Singh b0a0a47a0b Rename variables (#124)
-   data_batch → batch
-   batch_i → batch_idx
-   dataloader_i → dataloader_idx
-   tng → training
-   training_dataloader → train_dataloader
-   add_log_row_interval → row_log_interval
-   gradient_clip → gradient_clip_val
-   prog → progress
-   tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
Cola 3d16a686b3 Add EarlyStop documentation (#245)
* Update Training Loop.md

* Update index.md

* Update README.md

* Update Training Loop.md

* Update Training Loop.md
2019-09-25 14:52:40 -04:00
Oscar A. Rangel eb268c4184 Added missing parameters (#237)
* Added missing parameters

added missing distributed_backend parameter and added the parameter to step 4 Init Trainer.

* Update single_gpu_node_dp_template.py
2019-09-21 09:45:12 -04:00
Oscar A. Rangel 6803018a49 changed hard coded paramater, and moved it to parent_parser (#238)
* changed hard coded paramater, and moved it to parent_parser

```python

    # ------------------------
    # 4 INIT TRAINER
    # ------------------------
    trainer = Trainer(
        experiment=exp,
        checkpoint_callback=checkpoint,
        early_stop_callback=early_stop,
        gpus=hparams.gpus,
        distributed_backend=hparams.dist_bak_end
    )


    parent_parser.add_argument('--dist_bak_end', type=str, default='ddp',
                                help='When using multiple GPUs set Trainer(distributed_backend=dp) (or ddp)')  
```

* Update single_gpu_node_ddp_template.py
2019-09-21 09:44:08 -04:00
William Falcon 87708157bc
Update trainer.py (#233) 2019-09-19 08:23:48 -04:00
William Falcon 2a1bc22f42 updated docs 2019-09-17 09:57:16 -04:00
William Falcon d3afc8acd5 updated docs 2019-09-17 09:53:31 -04:00
William Falcon 4c61d1f30a updated docs 2019-09-16 11:07:16 -04:00
William Falcon e1adbe80f9 updated docs 2019-09-16 11:04:40 -04:00
William Falcon 286625a02f updated docs 2019-09-16 11:02:04 -04:00
William Falcon b354988255 updated docs 2019-09-16 10:59:28 -04:00
William Falcon b3c1911813
Update README.md 2019-09-16 10:56:37 -04:00
William Falcon 974afba2be release v0.4.9 2019-09-16 10:50:59 -04:00
William Falcon 55e7322747
Metrics load (#228)
* load from metrics defaults to CPU

* load from metrics defaults to CPU

* load from metrics defaults to CPU
2019-09-16 10:47:19 -04:00
Ananya Harsh Jha c0f3b6b035 added set_epoch for distributed sampler, fix for #224 (#225) 2019-09-16 10:21:00 -04:00
William Falcon e339799a0a
Update README.md 2019-09-14 09:55:42 -04:00
William Falcon 50f5e4bec8
Update single_cpu_template.py 2019-09-14 02:23:49 -04:00
William Falcon 330a21ea91
Update README.md 2019-09-14 02:18:33 -04:00
William Falcon f3221a5014
Update multi_node_cluster_auto_slurm.py 2019-09-14 02:14:08 -04:00
William Falcon fe17d14ade
Update multi_node_cluster_auto_slurm.py 2019-09-13 17:05:49 -04:00
William Falcon 9576dd28b2
added load on CPU first (#221)
* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added load on CPU first

* added print logs

* added print logs

* changed close order

* changed close order
2019-09-11 07:52:36 -04:00
William Falcon 90353ac54e changed examples scripts 2019-09-11 07:05:15 -04:00
William Falcon cf7dbf6d7c changed examples scripts 2019-09-11 07:03:31 -04:00
William Falcon 30b25c8146
Sai prasanna master (#219)
* Fix incorrect warning for DistributedSampler.

Check whether `dataloader.sampler` is an instance of DistributedSampler instead of checking the `dataloader`.

* Update trainer.py

* merged
2019-09-09 11:36:24 -04:00
William Falcon ac0111c196
Update multi_node_cluster_auto_slurm.py 2019-09-09 10:55:47 -04:00
William Falcon cbc619afa1
Update multi_node_own_slurm_script.py 2019-09-09 10:54:43 -04:00
William Falcon 3393086cb6
Update multi_node_cluster_auto_slurm.py 2019-09-09 10:53:47 -04:00
William Falcon 506d5da68b
enable single gpu per node (#218)
* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node

* enable single gpu per node
2019-09-09 07:37:20 -04:00
William Falcon a6fe6f0917
Update README.md 2019-09-08 18:21:05 -04:00
William Falcon 8f289f9fa8
Update README.md 2019-09-08 18:19:00 -04:00
William Falcon 6c947f4e0d
Update README.md 2019-09-08 18:18:21 -04:00
William Falcon 396047ffa0
Updated distributed Demos (#215)
* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* simple slurm example

* simple slurm example

* simple slurm example
2019-09-08 18:17:33 -04:00
William Falcon 83b756f77b
Update tox.ini 2019-09-08 15:46:30 -04:00
William Falcon 10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster (#213)
* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon b3434943c7
Update multi_node_cluster_template.py 2019-09-07 10:31:20 -04:00
Alok Singh 81df2259ef Make print_nan_grads print grad (#208)
This seems more useful for debugging.
2019-09-07 01:08:09 -04:00