lightning/docs/Trainer/Logging.md

178 lines
4.5 KiB
Markdown

Lighting offers options for logging information about model, gpu usage, etc, via several different logging frameworks. It also offers printing options for training monitoring.
---
### default_save_path
Lightning sets a default TestTubeLogger and CheckpointCallback for you which log to
```os.getcwd()``` by default. To modify the logging path you can set:
```python
Trainer(default_save_path='/your/path/to/save/checkpoints')
```
If you need more custom behavior (different paths for both, different metrics, etc...)
from the logger and the checkpointCallback, pass in your own instances as explained below.
---
### Setting up logging
The trainer inits a default logger for you (TestTubeLogger). All logs will
go to the current working directory under a folder named ```os.getcwd()/lightning_logs``.
If you want to modify the default logging behavior even more, pass in a logger
(which should inherit from `LightningBaseLogger`).
```{.python}
my_logger = MyLightningLogger(...)
trainer = Trainer(logger=my_logger)
```
The path in this logger will overwrite default_save_path.
Lightning supports several common experiment tracking frameworks out of the box
---
#### Test tube
Log using [test tube](https://williamfalcon.github.io/test-tube/).
```{.python}
from pytorch_lightning.logging import TestTubeLogger
tt_logger = TestTubeLogger(
save_dir=".",
name="default",
debug=False,
create_git_tag=False
)
trainer = Trainer(logger=tt_logger)
```
---
#### MLFlow
Log using [mlflow](https://mlflow.org)
```{.python}
from pytorch_lightning.logging import MLFlowLogger
mlf_logger = MLFlowLogger(
experiment_name="default",
tracking_uri="file:/."
)
trainer = Trainer(logger=mlf_logger)
```
---
#### Custom logger
You can implement your own logger by writing a class that inherits from
`LightningLoggerBase`. Use the `rank_zero_only` decorator to make sure that
only the first process in DDP training logs data.
```{.python}
from pytorch_lightning.logging import LightningLoggerBase, rank_zero_only
class MyLogger(LightningLoggerBase):
@rank_zero_only
def log_hyperparams(self, params):
# params is an argparse.Namespace
# your code to record hyperparameters goes here
pass
@rank_zero_only
def log_metrics(self, metrics, step_num):
# metrics is a dictionary of metric names and values
# your code to record metrics goes here
pass
def save(self):
# Optional. Any code necessary to save logger data goes here
pass
@rank_zero_only
def finalize(self, status):
# Optional. Any code that needs to be run after training
# finishes goes here
```
If you write a logger than may be useful to others, please send
a pull request to add it to Lighting!
---
#### Using loggers
You can call the logger anywhere from your LightningModule by doing:
```python
self.logger
# add an image if using TestTubeLogger
self.logger.experiment.add_image(...)
```
#### Display metrics in progress bar
``` {.python}
# DEFAULT
trainer = Trainer(show_progress_bar=True)
```
---
#### Log metric row every k batches
Every k batches lightning will make an entry in the metrics log
``` {.python}
# DEFAULT (ie: save a .csv log file every 10 batches)
trainer = Trainer(row_log_interval=10)
```
---
#### Log GPU memory
Logs GPU memory when metrics are logged.
``` {.python}
# DEFAULT
trainer = Trainer(log_gpu_memory=None)
# log only the min/max utilization
trainer = Trainer(log_gpu_memory='min_max')
# log all the GPU memory (if on DDP, logs only that node)
trainer = Trainer(log_gpu_memory='all')
```
---
#### Process position
When running multiple models on the same machine we want to decide which progress bar to use.
Lightning will stack progress bars according to this value.
``` {.python}
# DEFAULT
trainer = Trainer(process_position=0)
# if this is the second model on the node, show the second progress bar below
trainer = Trainer(process_position=1)
```
---
#### Save a snapshot of all hyperparameters
Automatically log hyperparameters stored in the `hparams` attribute as an `argparse.Namespace`
``` {.python}
class MyModel(pl.Lightning):
def __init__(self, hparams):
self.hparams = hparams
...
args = parser.parse_args()
model = MyModel(args)
logger = TestTubeLogger(...)
t = Trainer(logger=logger)
trainer.fit(model)
```
---
#### Write logs file to csv every k batches
Every k batches, lightning will write the new logs to disk
``` {.python}
# DEFAULT (ie: save a .csv log file every 100 batches)
trainer = Trainer(log_save_interval=100)
```