Commit Graph

220 Commits

Author SHA1 Message Date
Nic Eggert 614cb3c03b Initialize loggers only once (#270)
* Create underlying loggers lazily

This avoids creating duplicate experiments or run in multi-node DDP.

* Save hyperparameters automatically

* Update docs for snapshotting hyperparams

* Fix test tube

* Fix test tube pickling
2019-10-02 11:10:40 -04:00
William Falcon 133d6b3ec1 updated docs 2019-10-01 06:38:10 -04:00
William Falcon fbc2cfd513 updated docs 2019-10-01 06:29:12 -04:00
Nic Eggert 480eed5cb6 Enable any ML experiment tracking framework (#223)
* Implement generic loggers for experiment tracking

* Add tests for loggers

* Get model tests passing

* Test and fix logger pickling

* Expand pickle test and fix bug

* Missed exp -> logger conversion

* Remove commented code

* Add docstrings

* Update logging docs

* Add mlflow to test requirements

* Make linter happy

* Fix mlflow timestamp

* Update Logging.md

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update properties.md

* Fix tests

* Line length
2019-09-27 12:05:29 -04:00
William Falcon 1d7ffd11da
delete ref to old update_training_log_metrics (#262) 2019-09-26 17:53:15 -04:00
William Falcon 059b2fae29
Update Distributed training.md 2019-09-26 15:30:54 -04:00
William Falcon cefcf4cd12
Update Distributed training.md 2019-09-26 15:27:34 -04:00
Adrian Wälchli e713e2e1e0 fix typo in early stopping (#260) 2019-09-26 15:04:57 -04:00
William Falcon acb4ebea56 added docs for cluster grid search 2019-09-26 12:02:03 -04:00
William Falcon 97b6ebccc0
expanded apex install (#255) 2019-09-26 09:36:03 -04:00
William Falcon 3337c0237b
Fixes #250 (#253) 2019-09-26 09:13:00 -04:00
Alok Singh b0a0a47a0b Rename variables (#124)
-   data_batch → batch
-   batch_i → batch_idx
-   dataloader_i → dataloader_idx
-   tng → training
-   training_dataloader → train_dataloader
-   add_log_row_interval → row_log_interval
-   gradient_clip → gradient_clip_val
-   prog → progress
-   tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
Cola 3d16a686b3 Add EarlyStop documentation (#245)
* Update Training Loop.md

* Update index.md

* Update README.md

* Update Training Loop.md

* Update Training Loop.md
2019-09-25 14:52:40 -04:00
William Falcon 2a1bc22f42 updated docs 2019-09-17 09:57:16 -04:00
William Falcon d3afc8acd5 updated docs 2019-09-17 09:53:31 -04:00
William Falcon 4c61d1f30a updated docs 2019-09-16 11:07:16 -04:00
William Falcon e1adbe80f9 updated docs 2019-09-16 11:04:40 -04:00
William Falcon 286625a02f updated docs 2019-09-16 11:02:04 -04:00
William Falcon b354988255 updated docs 2019-09-16 10:59:28 -04:00
William Falcon 10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster (#213)
* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon 3e74ea15d8
Fixes #120 (#210) 2019-09-06 14:27:24 -04:00
William Falcon 60633eaa32
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler

* added restore weight functions

* set slurm signal handling inside process

* added resubmit docs

* added resubmit docs

* fixed missing param

* Update trainer.py

* fixed missing param

* fixed missing param

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests
2019-09-06 11:54:51 -04:00
Nic Eggert 1733dba735 Pass outputs from all dataloaders to test_end and validation_end (#203)
* Pass outputs from all dataloaders to test_end and validation_end

* Update tests

* Update docs

* Update trainer.py

* Update test_models.py
2019-09-06 07:37:25 -04:00
Max Horn dac41030d4 Allow to deactivate GPU memory logging in Trainer (#190)
* Allow to deactivate GPU memory logging in Trainer

Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU
memory utilization. On some servers logging the GPU memory usage can
significantly slow down training.

* Update Logging.md

* Update trainer.py
2019-09-04 10:43:46 -04:00
William Falcon c4ce347f3e testing loop docs 2019-09-02 07:15:45 -04:00
William Falcon 9e6ce3b0d6 testing loop docs 2019-09-02 07:15:45 -04:00
William Falcon a327596b79 add training loop docs 2019-09-02 07:15:45 -04:00
Verena Haunschmid 25d5b25792 Expectopatronum implement #89 (#182)
* rename validate -> evaluate; implement test logic; allow multiple test_loaders

* add test_step and test_end to LightningModule

* add in_test_mode to pretraining to implement case 2 (test pretrained model)

* fix code style issues

* LightningTestModel: add optional second test set, implement test_step and test_end

* implemented test for multiple test_dataloaders; fixed typo

* add two test cases for #89

* add documentation for test_step, test_end; fix computation of loss in validation_step example

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Added proper dp ddp routing calls for test mode

* Update trainer.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update override_data_parallel.py

* Update test_models.py

* Update test_models.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update test_models.py

* Update test_models.py

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* Update trainer.py

* Update override_data_parallel.py

* Update debug.py

* Update lm_test_module.py

* Update test_models.py
2019-09-02 07:15:27 -04:00
Ir1dXD c2247350bb feat(val_sanity): enable skipping validation sanity (#176)
* feat(val_sanity): enable skipping validation sanity when self.nb_sanity_val_steps is 0

* docs: elaborate on skipping
2019-08-28 06:41:31 -04:00
Ir1dXD 6eb6daa278 enable highlight (#170) 2019-08-27 07:09:46 -04:00
William Falcon 4104a0fc47
cleaned up progbar (#165)
* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* flake 8
2019-08-23 21:23:27 -04:00
Sebastian Præsius 9fc66026f1 train = False in test_dataloader (#162)
A small change to the CoolModel example.
Now test_dataloader returns the MNIST test dataset.
2019-08-22 17:44:06 -04:00
sebftw a7a14dadb6 F.cross_entropy(y_hat, y)(y_hat, y) typo. (#137)
This seems to be a typo. Throws TypeError: 'Tensor' object is not callable.
2019-08-18 18:17:43 -04:00
sebftw b2a49197e4 tensorboarX to tensorboardX (#136)
* tensorboarX to tensorboardX

* Update properties.md
2019-08-18 18:17:05 -04:00
Ir1dXD 48de39ed50 elaborate on the correlation between overfit_pct and xxx_percent_check (#132)
* Update Training Loop.md

* update docs and elaborate on the correlation
2019-08-17 10:23:25 -04:00
Ir1dXD 24a97956e4 fix typo in docs (#129)
* fix typo

* fix typo

* fix typo

* fix list
2019-08-17 07:48:33 -04:00
William Falcon e60e002f17 updated docs 2019-08-16 17:14:31 -04:00
William Falcon bdd86087e6 updated docs 2019-08-16 10:07:56 -04:00
William Falcon 50f0de094f updated docs 2019-08-16 10:07:44 -04:00
William Falcon bc401d0f59 updated docs 2019-08-16 10:02:28 -04:00
William Falcon 4b97319c2e updated docs 2019-08-15 21:29:25 -04:00
William Falcon 0e92a9d7af updated docs 2019-08-15 21:19:29 -04:00
William Falcon 44da88fd15 updated docs 2019-08-15 13:59:27 -04:00
William Falcon 3dea127edb updated docs 2019-08-13 13:05:47 -04:00
William Falcon d4b1ac94a0 updated docs 2019-08-13 13:03:39 -04:00
William Falcon b89b7f0a8c updated docs 2019-08-13 13:02:17 -04:00
William Falcon 699fbabda7 updated optimizer_step docs 2019-08-13 11:59:33 -04:00
William Falcon fd845d41c0 updated optimizer_step docs 2019-08-13 11:57:02 -04:00
William Falcon d7660d3c64 updated optimizer_step docs 2019-08-13 11:55:10 -04:00
William Falcon 7e38f1f246 updated optimizer_step docs 2019-08-13 11:54:19 -04:00