lightning/docs/Trainer/Logging.md

Lighting offers options for logging information about model, gpu usage, etc, via several different logging frameworks. It also offers printing options for training monitoring.

---   
### default_save_path   
Lightning sets a default TestTubeLogger and CheckpointCallback for you which log to
```os.getcwd()``` by default. To modify the logging path you can set:
```python
Trainer(default_save_path='/your/path/to/save/checkpoints')
```
 
If you need more custom behavior (different paths for both, different metrics, etc...)
from the logger and the checkpointCallback, pass in your own instances as explained below.


---
### Setting up logging

The trainer inits a default logger for you (TestTubeLogger). All logs will
go to the current working directory under a folder named ```os.getcwd()/lightning_logs``. 

If you want to modify the default logging behavior even more, pass in a logger
(which should inherit from `LightningBaseLogger`).   

```{.python}
my_logger = MyLightningLogger(...)
trainer = Trainer(logger=my_logger)
```

The path in this logger will overwrite default_save_path.

Lightning supports several common experiment tracking frameworks out of the box

---
#### Test tube

Log using [test tube](https://williamfalcon.github.io/test-tube/).

```{.python}
from pytorch_lightning.logging import TestTubeLogger
tt_logger = TestTubeLogger(
    save_dir=".",
    name="default",
    debug=False,
    create_git_tag=False
)
trainer = Trainer(logger=tt_logger)
```

---
#### MLFlow

Log using [mlflow](https://mlflow.org)

```{.python}
from pytorch_lightning.logging import MLFlowLogger
mlf_logger = MLFlowLogger(
    experiment_name="default",
    tracking_uri="file:/."
)
trainer = Trainer(logger=mlf_logger)
```

---
#### Custom logger

You can implement your own logger by writing a class that inherits from
`LightningLoggerBase`. Use the `rank_zero_only` decorator to make sure that
only the first process in DDP training logs data.

```{.python}
from pytorch_lightning.logging import LightningLoggerBase, rank_zero_only

class MyLogger(LightningLoggerBase):

    @rank_zero_only
    def log_hyperparams(self, params):
        # params is an argparse.Namespace
        # your code to record hyperparameters goes here
        pass
    
    @rank_zero_only
    def log_metrics(self, metrics, step_num):
        # metrics is a dictionary of metric names and values
        # your code to record metrics goes here
        pass
    
    def save(self):
        # Optional. Any code necessary to save logger data goes here
        pass
    
    @rank_zero_only
    def finalize(self, status):
        # Optional. Any code that needs to be run after training
        # finishes goes here
```

If you write a logger than may be useful to others, please send
a pull request to add it to Lighting!

---
#### Using loggers
You can call the logger anywhere from your LightningModule by doing:
```python
self.logger

# add an image if using TestTubeLogger
self.logger.experiment.add_image(...)
```


#### Display metrics in progress bar 
``` {.python}
# DEFAULT
trainer = Trainer(show_progress_bar=True)
```

---
#### Log metric row every k batches 
Every k batches lightning will make an entry in the metrics log
``` {.python}
# DEFAULT (ie: save a .csv log file every 10 batches)
trainer = Trainer(row_log_interval=10)
```   

---
#### Log GPU memory
Logs GPU memory when metrics are logged.   
``` {.python}
# DEFAULT
trainer = Trainer(log_gpu_memory=None)

# log only the min/max utilization
trainer = Trainer(log_gpu_memory='min_max')

# log all the GPU memory (if on DDP, logs only that node)
trainer = Trainer(log_gpu_memory='all')
```

---
#### Process position
When running multiple models on the same machine we want to decide which progress bar to use.
Lightning will stack progress bars according to this value. 
``` {.python}
# DEFAULT
trainer = Trainer(process_position=0)

# if this is the second model on the node, show the second progress bar below
trainer = Trainer(process_position=1)
```

---
#### Save a snapshot of all hyperparameters 
Automatically log hyperparameters stored in the `hparams` attribute as an `argparse.Namespace` 
``` {.python}

class MyModel(pl.Lightning):
    def __init__(self, hparams):
        self.hparams = hparams

    ...

args = parser.parse_args()
model = MyModel(args)

logger = TestTubeLogger(...)
t = Trainer(logger=logger)
trainer.fit(model)
```

---
#### Write logs file to csv every k batches 
Every k batches, lightning will write the new logs to disk
``` {.python}
# DEFAULT (ie: save a .csv log file every 100 batches)
trainer = Trainer(log_save_interval=100)
```
Enable any ML experiment tracking framework (#223) * Implement generic loggers for experiment tracking * Add tests for loggers * Get model tests passing * Test and fix logger pickling * Expand pickle test and fix bug * Missed exp -> logger conversion * Remove commented code * Add docstrings * Update logging docs * Add mlflow to test requirements * Make linter happy * Fix mlflow timestamp * Update Logging.md * Update test_models.py * Update test_models.py * Update test_models.py * Update properties.md * Fix tests * Line length 2019-09-27 16:05:29 +00:00			`Lighting offers options for logging information about model, gpu usage, etc, via several different logging frameworks. It also offers printing options for training monitoring.`
added val loop options 2019-06-27 17:47:19 +00:00
Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`---`
			`### default_save_path`
			`Lightning sets a default TestTubeLogger and CheckpointCallback for you which log to`
			```os.getcwd()``` by default. To modify the logging path you can set:
			```python
			`Trainer(default_save_path='/your/path/to/save/checkpoints')`
			```

			`If you need more custom behavior (different paths for both, different metrics, etc...)`
			`from the logger and the checkpointCallback, pass in your own instances as explained below.`

added val loop options 2019-06-27 17:47:19 +00:00
			`---`
Enable any ML experiment tracking framework (#223) * Implement generic loggers for experiment tracking * Add tests for loggers * Get model tests passing * Test and fix logger pickling * Expand pickle test and fix bug * Missed exp -> logger conversion * Remove commented code * Add docstrings * Update logging docs * Add mlflow to test requirements * Make linter happy * Fix mlflow timestamp * Update Logging.md * Update test_models.py * Update test_models.py * Update test_models.py * Update properties.md * Fix tests * Line length 2019-09-27 16:05:29 +00:00			`### Setting up logging`

Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`The trainer inits a default logger for you (TestTubeLogger). All logs will`
			go to the current working directory under a folder named ```os.getcwd()/lightning_logs``.

			`If you want to modify the default logging behavior even more, pass in a logger`
			(which should inherit from `LightningBaseLogger`).

Enable any ML experiment tracking framework (#223) * Implement generic loggers for experiment tracking * Add tests for loggers * Get model tests passing * Test and fix logger pickling * Expand pickle test and fix bug * Missed exp -> logger conversion * Remove commented code * Add docstrings * Update logging docs * Add mlflow to test requirements * Make linter happy * Fix mlflow timestamp * Update Logging.md * Update test_models.py * Update test_models.py * Update test_models.py * Update properties.md * Fix tests * Line length 2019-09-27 16:05:29 +00:00			```{.python}
			`my_logger = MyLightningLogger(...)`
			`trainer = Trainer(logger=my_logger)`
			```

Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`The path in this logger will overwrite default_save_path.`

Enable any ML experiment tracking framework (#223) * Implement generic loggers for experiment tracking * Add tests for loggers * Get model tests passing * Test and fix logger pickling * Expand pickle test and fix bug * Missed exp -> logger conversion * Remove commented code * Add docstrings * Update logging docs * Add mlflow to test requirements * Make linter happy * Fix mlflow timestamp * Update Logging.md * Update test_models.py * Update test_models.py * Update test_models.py * Update properties.md * Fix tests * Line length 2019-09-27 16:05:29 +00:00			`Lightning supports several common experiment tracking frameworks out of the box`

			`---`
			`#### Test tube`

			`Log using [test tube](https://williamfalcon.github.io/test-tube/).`

			```{.python}
			`from pytorch_lightning.logging import TestTubeLogger`
			`tt_logger = TestTubeLogger(`
			`save_dir=".",`
			`name="default",`
			`debug=False,`
			`create_git_tag=False`
			`)`
			`trainer = Trainer(logger=tt_logger)`
			```

			`---`
			`#### MLFlow`

			`Log using [mlflow](https://mlflow.org)`

			```{.python}
			`from pytorch_lightning.logging import MLFlowLogger`
			`mlf_logger = MLFlowLogger(`
			`experiment_name="default",`
			`tracking_uri="file:/."`
			`)`
			`trainer = Trainer(logger=mlf_logger)`
			```

			`---`
			`#### Custom logger`

			`You can implement your own logger by writing a class that inherits from`
			`LightningLoggerBase`. Use the `rank_zero_only` decorator to make sure that
			`only the first process in DDP training logs data.`

			```{.python}
			`from pytorch_lightning.logging import LightningLoggerBase, rank_zero_only`

			`class MyLogger(LightningLoggerBase):`

			`@rank_zero_only`
			`def log_hyperparams(self, params):`
			`# params is an argparse.Namespace`
			`# your code to record hyperparameters goes here`
			`pass`

			`@rank_zero_only`
			`def log_metrics(self, metrics, step_num):`
			`# metrics is a dictionary of metric names and values`
			`# your code to record metrics goes here`
			`pass`

			`def save(self):`
			`# Optional. Any code necessary to save logger data goes here`
			`pass`

			`@rank_zero_only`
			`def finalize(self, status):`
			`# Optional. Any code that needs to be run after training`
			`# finishes goes here`
			```

			`If you write a logger than may be useful to others, please send`
			`a pull request to add it to Lighting!`

			`---`
clean up docs around loggers (#304) 2019-10-04 22:53:38 +00:00			`#### Using loggers`
			`You can call the logger anywhere from your LightningModule by doing:`
			```python
			`self.logger`

			`# add an image if using TestTubeLogger`
			`self.logger.experiment.add_image(...)`
			```

Enable any ML experiment tracking framework (#223) * Implement generic loggers for experiment tracking * Add tests for loggers * Get model tests passing * Test and fix logger pickling * Expand pickle test and fix bug * Missed exp -> logger conversion * Remove commented code * Add docstrings * Update logging docs * Add mlflow to test requirements * Make linter happy * Fix mlflow timestamp * Update Logging.md * Update test_models.py * Update test_models.py * Update test_models.py * Update properties.md * Fix tests * Line length 2019-09-27 16:05:29 +00:00
added val loop options 2019-06-27 17:47:19 +00:00			`#### Display metrics in progress bar`
			``` {.python}
			`# DEFAULT`
cleaned up progbar (#165) * cleaned up progbar * cleaned up progbar * cleaned up progbar * cleaned up progbar * cleaned up progbar * cleaned up progbar * cleaned up progbar * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * updated base files * flake 8 2019-08-24 01:23:27 +00:00			`trainer = Trainer(show_progress_bar=True)`
added val loop options 2019-06-27 17:47:19 +00:00			```

debugging and gpu guide 2019-06-27 18:22:00 +00:00			`---`
			`#### Log metric row every k batches`
			`Every k batches lightning will make an entry in the metrics log`
			``` {.python}
			`# DEFAULT (ie: save a .csv log file every 10 batches)`
Rename variables (#124) - data_batch → batch - batch_i → batch_idx - dataloader_i → dataloader_idx - tng → training - training_dataloader → train_dataloader - add_log_row_interval → row_log_interval - gradient_clip → gradient_clip_val - prog → progress - tqdm_dic → tqdm_dict 2019-09-25 23:05:06 +00:00			`trainer = Trainer(row_log_interval=10)`
Allow to deactivate GPU memory logging in Trainer (#190) * Allow to deactivate GPU memory logging in Trainer Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU memory utilization. On some servers logging the GPU memory usage can significantly slow down training. * Update Logging.md * Update trainer.py 2019-09-04 14:43:46 +00:00			```

			`---`
Enable any ML experiment tracking framework (#223) * Implement generic loggers for experiment tracking * Add tests for loggers * Get model tests passing * Test and fix logger pickling * Expand pickle test and fix bug * Missed exp -> logger conversion * Remove commented code * Add docstrings * Update logging docs * Add mlflow to test requirements * Make linter happy * Fix mlflow timestamp * Update Logging.md * Update test_models.py * Update test_models.py * Update test_models.py * Update properties.md * Fix tests * Line length 2019-09-27 16:05:29 +00:00			`#### Log GPU memory`
Allow to deactivate GPU memory logging in Trainer (#190) * Allow to deactivate GPU memory logging in Trainer Adds the flag `log_gpu_memory` to Trainer to deactivate logging of GPU memory utilization. On some servers logging the GPU memory usage can significantly slow down training. * Update Logging.md * Update trainer.py 2019-09-04 14:43:46 +00:00			`Logs GPU memory when metrics are logged.`
			``` {.python}
			`# DEFAULT`
Gpu mem (#308) * Fixes #289 * Fixes #289 * added lbfgs support * Fixes #280 (#309) * added test seeds (#306) * added test seeds * added test seeds * updated docs * added lbfgs support (#310) * added lbfgs support * added lbfgs support * added lbfgs support * Fixes #280 (#309) * added test seeds (#306) * added test seeds * added test seeds * updated docs * added lbfgs support * added lbfgs support * added lbfgs support * added lbfgs support * added lbfgs support * added lbfgs support * added lbfgs support * added lbfgs support * Fixes #289 * Fixes #289 * merged master * merged master 2019-10-05 15:29:34 +00:00			`trainer = Trainer(log_gpu_memory=None)`

			`# log only the min/max utilization`
			`trainer = Trainer(log_gpu_memory='min_max')`

			`# log all the GPU memory (if on DDP, logs only that node)`
			`trainer = Trainer(log_gpu_memory='all')`
debugging and gpu guide 2019-06-27 18:22:00 +00:00			```
added val loop options 2019-06-27 17:47:19 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`---`
			`#### Process position`
			`When running multiple models on the same machine we want to decide which progress bar to use.`
			`Lightning will stack progress bars according to this value.`
			``` {.python}
			`# DEFAULT`
			`trainer = Trainer(process_position=0)`

			`# if this is the second model on the node, show the second progress bar below`
			`trainer = Trainer(process_position=1)`
			```
added val loop options 2019-06-27 17:47:19 +00:00
			`---`
added val loop options 2019-06-27 17:58:13 +00:00			`#### Save a snapshot of all hyperparameters`
Initialize loggers only once (#270) * Create underlying loggers lazily This avoids creating duplicate experiments or run in multi-node DDP. * Save hyperparameters automatically * Update docs for snapshotting hyperparams * Fix test tube * Fix test tube pickling 2019-10-02 15:10:40 +00:00			Automatically log hyperparameters stored in the `hparams` attribute as an `argparse.Namespace`
debugging and gpu guide 2019-06-27 18:22:00 +00:00			``` {.python}
added tb docs 2019-07-27 22:40:29 +00:00
Initialize loggers only once (#270) * Create underlying loggers lazily This avoids creating duplicate experiments or run in multi-node DDP. * Save hyperparameters automatically * Update docs for snapshotting hyperparams * Fix test tube * Fix test tube pickling 2019-10-02 15:10:40 +00:00			`class MyModel(pl.Lightning):`
			`def __init__(self, hparams):`
			`self.hparams = hparams`

			`...`

			`args = parser.parse_args()`
			`model = MyModel(args)`

			`logger = TestTubeLogger(...)`
			`t = Trainer(logger=logger)`
			`trainer.fit(model)`
added tb docs 2019-07-27 22:40:29 +00:00			```
debugging and gpu guide 2019-06-27 18:22:00 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`---`
			`#### Write logs file to csv every k batches`
			`Every k batches, lightning will write the new logs to disk`
			``` {.python}
			`# DEFAULT (ie: save a .csv log file every 100 batches)`
			`trainer = Trainer(log_save_interval=100)`
added val loop options 2019-06-27 17:47:19 +00:00			```
added val loop options 2019-06-27 17:58:13 +00:00