Nic Eggert
614cb3c03b
Initialize loggers only once ( #270 )
...
* Create underlying loggers lazily
This avoids creating duplicate experiments or run in multi-node DDP.
* Save hyperparameters automatically
* Update docs for snapshotting hyperparams
* Fix test tube
* Fix test tube pickling
2019-10-02 11:10:40 -04:00
William Falcon
133d6b3ec1
updated docs
2019-10-01 06:38:10 -04:00
William Falcon
fbc2cfd513
updated docs
2019-10-01 06:29:12 -04:00
Nic Eggert
480eed5cb6
Enable any ML experiment tracking framework ( #223 )
...
* Implement generic loggers for experiment tracking
* Add tests for loggers
* Get model tests passing
* Test and fix logger pickling
* Expand pickle test and fix bug
* Missed exp -> logger conversion
* Remove commented code
* Add docstrings
* Update logging docs
* Add mlflow to test requirements
* Make linter happy
* Fix mlflow timestamp
* Update Logging.md
* Update test_models.py
* Update test_models.py
* Update test_models.py
* Update properties.md
* Fix tests
* Line length
2019-09-27 12:05:29 -04:00
William Falcon
1d7ffd11da
delete ref to old update_training_log_metrics ( #262 )
2019-09-26 17:53:15 -04:00
William Falcon
059b2fae29
Update Distributed training.md
2019-09-26 15:30:54 -04:00
William Falcon
cefcf4cd12
Update Distributed training.md
2019-09-26 15:27:34 -04:00
Adrian Wälchli
e713e2e1e0
fix typo in early stopping ( #260 )
2019-09-26 15:04:57 -04:00
William Falcon
acb4ebea56
added docs for cluster grid search
2019-09-26 12:02:03 -04:00
William Falcon
97b6ebccc0
expanded apex install ( #255 )
2019-09-26 09:36:03 -04:00
William Falcon
3337c0237b
Fixes #250 ( #253 )
2019-09-26 09:13:00 -04:00
Alok Singh
b0a0a47a0b
Rename variables ( #124 )
...
- data_batch → batch
- batch_i → batch_idx
- dataloader_i → dataloader_idx
- tng → training
- training_dataloader → train_dataloader
- add_log_row_interval → row_log_interval
- gradient_clip → gradient_clip_val
- prog → progress
- tqdm_dic → tqdm_dict
2019-09-25 19:05:06 -04:00
Cola
3d16a686b3
Add EarlyStop documentation ( #245 )
...
* Update Training Loop.md
* Update index.md
* Update README.md
* Update Training Loop.md
* Update Training Loop.md
2019-09-25 14:52:40 -04:00
William Falcon
2a1bc22f42
updated docs
2019-09-17 09:57:16 -04:00
William Falcon
d3afc8acd5
updated docs
2019-09-17 09:53:31 -04:00
William Falcon
4c61d1f30a
updated docs
2019-09-16 11:07:16 -04:00
William Falcon
e1adbe80f9
updated docs
2019-09-16 11:04:40 -04:00
William Falcon
286625a02f
updated docs
2019-09-16 11:02:04 -04:00
William Falcon
b354988255
updated docs
2019-09-16 10:59:28 -04:00
William Falcon
10d190e045
Simplified gpu api. No NVIDIA flag managing by lightning for cluster ( #213 )
...
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added nvidia flag set
* added simple cluster template
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
* sets correct backend for possible combinations of gpu inputs
2019-09-08 15:36:58 -04:00
William Falcon
3e74ea15d8
Fixes #120 ( #210 )
2019-09-06 14:27:24 -04:00
William Falcon
60633eaa32
Moves hpc auto-resubmit to trainer from test-tube ( #207 )
...
* added slurm signal handler
* added restore weight functions
* set slurm signal handling inside process
* added resubmit docs
* added resubmit docs
* fixed missing param
* Update trainer.py
* fixed missing param
* fixed missing param
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
2019-09-06 11:54:51 -04:00
Nic Eggert
1733dba735
Pass outputs from all dataloaders to test_end and validation_end ( #203 )
...
* Pass outputs from all dataloaders to test_end and validation_end
* Update tests
* Update docs
* Update trainer.py
* Update test_models.py
2019-09-06 07:37:25 -04:00
Max Horn
dac41030d4
Allow to deactivate GPU memory logging in Trainer ( #190 )
...
* Allow to deactivate GPU memory logging in Trainer
Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU
memory utilization. On some servers logging the GPU memory usage can
significantly slow down training.
* Update Logging.md
* Update trainer.py
2019-09-04 10:43:46 -04:00
William Falcon
c4ce347f3e
testing loop docs
2019-09-02 07:15:45 -04:00
William Falcon
9e6ce3b0d6
testing loop docs
2019-09-02 07:15:45 -04:00
William Falcon
a327596b79
add training loop docs
2019-09-02 07:15:45 -04:00
Verena Haunschmid
25d5b25792
Expectopatronum implement #89 ( #182 )
...
* rename validate -> evaluate; implement test logic; allow multiple test_loaders
* add test_step and test_end to LightningModule
* add in_test_mode to pretraining to implement case 2 (test pretrained model)
* fix code style issues
* LightningTestModel: add optional second test set, implement test_step and test_end
* implemented test for multiple test_dataloaders; fixed typo
* add two test cases for #89
* add documentation for test_step, test_end; fix computation of loss in validation_step example
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Added proper dp ddp routing calls for test mode
* Update trainer.py
* Update test_models.py
* Update trainer.py
* Update trainer.py
* Update override_data_parallel.py
* Update test_models.py
* Update test_models.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update test_models.py
* Update test_models.py
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* debug
* Update trainer.py
* Update override_data_parallel.py
* Update debug.py
* Update lm_test_module.py
* Update test_models.py
2019-09-02 07:15:27 -04:00
Ir1dXD
c2247350bb
feat(val_sanity): enable skipping validation sanity ( #176 )
...
* feat(val_sanity): enable skipping validation sanity when self.nb_sanity_val_steps is 0
* docs: elaborate on skipping
2019-08-28 06:41:31 -04:00
Ir1dXD
6eb6daa278
enable highlight ( #170 )
2019-08-27 07:09:46 -04:00
William Falcon
4104a0fc47
cleaned up progbar ( #165 )
...
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* cleaned up progbar
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* updated base files
* flake 8
2019-08-23 21:23:27 -04:00
Sebastian Præsius
9fc66026f1
train = False in test_dataloader ( #162 )
...
A small change to the CoolModel example.
Now test_dataloader returns the MNIST test dataset.
2019-08-22 17:44:06 -04:00
sebftw
a7a14dadb6
F.cross_entropy(y_hat, y)(y_hat, y) typo. ( #137 )
...
This seems to be a typo. Throws TypeError: 'Tensor' object is not callable.
2019-08-18 18:17:43 -04:00
sebftw
b2a49197e4
tensorboarX to tensorboardX ( #136 )
...
* tensorboarX to tensorboardX
* Update properties.md
2019-08-18 18:17:05 -04:00
Ir1dXD
48de39ed50
elaborate on the correlation between overfit_pct and xxx_percent_check ( #132 )
...
* Update Training Loop.md
* update docs and elaborate on the correlation
2019-08-17 10:23:25 -04:00
Ir1dXD
24a97956e4
fix typo in docs ( #129 )
...
* fix typo
* fix typo
* fix typo
* fix list
2019-08-17 07:48:33 -04:00
William Falcon
e60e002f17
updated docs
2019-08-16 17:14:31 -04:00
William Falcon
bdd86087e6
updated docs
2019-08-16 10:07:56 -04:00
William Falcon
50f0de094f
updated docs
2019-08-16 10:07:44 -04:00
William Falcon
bc401d0f59
updated docs
2019-08-16 10:02:28 -04:00
William Falcon
4b97319c2e
updated docs
2019-08-15 21:29:25 -04:00
William Falcon
0e92a9d7af
updated docs
2019-08-15 21:19:29 -04:00
William Falcon
44da88fd15
updated docs
2019-08-15 13:59:27 -04:00
William Falcon
3dea127edb
updated docs
2019-08-13 13:05:47 -04:00
William Falcon
d4b1ac94a0
updated docs
2019-08-13 13:03:39 -04:00
William Falcon
b89b7f0a8c
updated docs
2019-08-13 13:02:17 -04:00
William Falcon
699fbabda7
updated optimizer_step docs
2019-08-13 11:59:33 -04:00
William Falcon
fd845d41c0
updated optimizer_step docs
2019-08-13 11:57:02 -04:00
William Falcon
d7660d3c64
updated optimizer_step docs
2019-08-13 11:55:10 -04:00
William Falcon
7e38f1f246
updated optimizer_step docs
2019-08-13 11:54:19 -04:00