Commit Graph

88 Commits

Author SHA1 Message Date
Adrian Wälchli 6e3e740a7f Param printing (#336)
* print thousands as K, M, B, T, ...

* add option to print top-level modules only

* added doc string and added spacing

* do not print summary if neither "full" nor "top"

* updated docs showing summary print options

* fix line length for travis
2019-10-08 15:30:06 -04:00
William Falcon 491100abdd
Docs (#315)
* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up demos

* cleaning up docs

* cleaned up test_tube logger

* cleaned up test_tube logger

* cleaned up test_tube logger
2019-10-05 23:52:32 -04:00
William Falcon 6cc3f1757f
decouple returns from each step (#307)
* decoupled training metrics from logging metrics

* decoupled validation metrics from log metrics

* updated docs

* updated docs

* updated docs

* Fixed test

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master

* merged master
2019-10-05 13:35:20 -04:00
William Falcon 8f5a06bfb8
Gpu mem (#308)
* Fixes #289

* Fixes #289

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support (#310)

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #280 (#309)

* added test seeds (#306)

* added test seeds

* added test seeds

* updated docs

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* added lbfgs support

* Fixes #289

* Fixes #289

* merged master

* merged master
2019-10-05 11:29:34 -04:00
William Falcon bf09060fef
Fixes #292 (#303)
* early stopping callback is not default

* added a default logger

* added default checkpoint callback

* added default checkpoint/loggers

* added default checkpoint/loggers

* updated docs

* cleaned demos

* cleaned demos

* cleaned demos

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers

* clean up docs around loggers
2019-10-04 19:48:57 -04:00
William Falcon a578de511d
clean up docs around loggers (#304) 2019-10-04 18:53:38 -04:00
William Falcon 32e74b8f36
Ddp2 (#261)
* adds ddp2 option where on each node a single  process  uses all gpus

* added ddp2  test

* added ddp2 docs

* Update Distributed training.md

* delete ref to old update_training_log_metrics

* delete ref to old update_training_log_metrics

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* banana pancakes

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* cheesecake
2019-10-04 15:07:54 -04:00
Wouter van Amsterdam 63c475c600 tiny spelling error (#295) 2019-10-04 07:14:30 -04:00
Nic Eggert 614cb3c03b Initialize loggers only once (#270)
* Create underlying loggers lazily

This avoids creating duplicate experiments or run in multi-node DDP.

* Save hyperparameters automatically

* Update docs for snapshotting hyperparams

* Fix test tube

* Fix test tube pickling
2019-10-02 11:10:40 -04:00
William Falcon fbc2cfd513 updated docs 2019-10-01 06:29:12 -04:00
Nic Eggert 480eed5cb6 Enable any ML experiment tracking framework (#223)
* Implement generic loggers for experiment tracking

* Add tests for loggers

* Get model tests passing

* Test and fix logger pickling

* Expand pickle test and fix bug

* Missed exp -> logger conversion

* Remove commented code

* Add docstrings

* Update logging docs

* Add mlflow to test requirements

* Make linter happy

* Fix mlflow timestamp

* Update Logging.md

* Update test_models.py

* Update test_models.py

* Update test_models.py

* Update properties.md

* Fix tests

* Line length
2019-09-27 12:05:29 -04:00
William Falcon 059b2fae29
Update Distributed training.md 2019-09-26 15:30:54 -04:00
William Falcon cefcf4cd12
Update Distributed training.md 2019-09-26 15:27:34 -04:00
Adrian Wälchli e713e2e1e0 fix typo in early stopping (#260) 2019-09-26 15:04:57 -04:00
William Falcon acb4ebea56 added docs for cluster grid search 2019-09-26 12:02:03 -04:00
William Falcon 97b6ebccc0
expanded apex install (#255) 2019-09-26 09:36:03 -04:00
Alok Singh b0a0a47a0b Rename variables (#124)
-   data_batch → batch
-   batch_i → batch_idx
-   dataloader_i → dataloader_idx
-   tng → training
-   training_dataloader → train_dataloader
-   add_log_row_interval → row_log_interval
-   gradient_clip → gradient_clip_val
-   prog → progress
-   tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
Cola 3d16a686b3 Add EarlyStop documentation (#245)
* Update Training Loop.md

* Update index.md

* Update README.md

* Update Training Loop.md

* Update Training Loop.md
2019-09-25 14:52:40 -04:00
William Falcon 2a1bc22f42 updated docs 2019-09-17 09:57:16 -04:00
William Falcon d3afc8acd5 updated docs 2019-09-17 09:53:31 -04:00
William Falcon 4c61d1f30a updated docs 2019-09-16 11:07:16 -04:00
William Falcon e1adbe80f9 updated docs 2019-09-16 11:04:40 -04:00
William Falcon 286625a02f updated docs 2019-09-16 11:02:04 -04:00
William Falcon b354988255 updated docs 2019-09-16 10:59:28 -04:00
William Falcon 10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster (#213)
* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added nvidia flag set

* added simple cluster template

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs

* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon 60633eaa32
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler

* added restore weight functions

* set slurm signal handling inside process

* added resubmit docs

* added resubmit docs

* fixed missing param

* Update trainer.py

* fixed missing param

* fixed missing param

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests
2019-09-06 11:54:51 -04:00
Max Horn dac41030d4 Allow to deactivate GPU memory logging in Trainer (#190)
* Allow to deactivate GPU memory logging in Trainer

Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU
memory utilization. On some servers logging the GPU memory usage can
significantly slow down training.

* Update Logging.md

* Update trainer.py
2019-09-04 10:43:46 -04:00
William Falcon c4ce347f3e testing loop docs 2019-09-02 07:15:45 -04:00
William Falcon 9e6ce3b0d6 testing loop docs 2019-09-02 07:15:45 -04:00
William Falcon a327596b79 add training loop docs 2019-09-02 07:15:45 -04:00
Ir1dXD c2247350bb feat(val_sanity): enable skipping validation sanity (#176)
* feat(val_sanity): enable skipping validation sanity when self.nb_sanity_val_steps is 0

* docs: elaborate on skipping
2019-08-28 06:41:31 -04:00
William Falcon 4104a0fc47
cleaned up progbar (#165)
* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* cleaned up progbar

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* updated base files

* flake 8
2019-08-23 21:23:27 -04:00
Ir1dXD 48de39ed50 elaborate on the correlation between overfit_pct and xxx_percent_check (#132)
* Update Training Loop.md

* update docs and elaborate on the correlation
2019-08-17 10:23:25 -04:00
Ir1dXD 24a97956e4 fix typo in docs (#129)
* fix typo

* fix typo

* fix typo

* fix list
2019-08-17 07:48:33 -04:00
William Falcon e60e002f17 updated docs 2019-08-16 17:14:31 -04:00
William Falcon 699fbabda7 updated optimizer_step docs 2019-08-13 11:59:33 -04:00
William Falcon fd845d41c0 updated optimizer_step docs 2019-08-13 11:57:02 -04:00
William Falcon 53ec3bc5bc updated optimizer_step docs 2019-08-13 11:47:35 -04:00
Lorenzo Fabbri 09d4475cc7 Update Checkpointing.md (#83)
* Update Checkpointing.md

Modified import for ModelCheckpoint.

* Update Checkpointing.md
2019-08-09 15:02:36 -04:00
Rich Lewis dd0db4aba2 docs(trainer): fix gradient clipping entry (#85)
- replace copy and paste error
- write brief description
- add link to pytorch docs for specific clipping implementation
- add example configuration
2019-08-09 15:02:14 -04:00
William Falcon 5c6cdc0f27 updated docs 2019-08-07 16:01:51 -04:00
William Falcon b198435d0e added single gpu train doc 2019-08-07 14:16:40 -04:00
William Falcon cca6d2c65d added single gpu train doc 2019-08-07 14:14:23 -04:00
William Falcon 73b50abb57 updated docs 2019-08-07 13:23:47 -04:00
William Falcon 08a6a250c7 updated docs 2019-08-07 13:17:51 -04:00
William Falcon 48bc3465e9 updated docs 2019-08-07 13:15:42 -04:00
William Falcon 35f23bbc82
Merge pull request #55 from williamFalcon/continue
add training restore
2019-08-07 09:02:16 -04:00
William Falcon 47a691f158 updated tests and docs 2019-08-07 07:09:37 -04:00
Jiri BOROVEC 632d07b490 fix prints for py3.5 2019-08-06 22:45:46 +02:00
William Falcon 181d69a727 updated docs 2019-08-04 13:05:56 -05:00