178 lines
4.5 KiB
Markdown
178 lines
4.5 KiB
Markdown
Lighting offers options for logging information about model, gpu usage, etc, via several different logging frameworks. It also offers printing options for training monitoring.
|
|
|
|
---
|
|
### default_save_path
|
|
Lightning sets a default TestTubeLogger and CheckpointCallback for you which log to
|
|
```os.getcwd()``` by default. To modify the logging path you can set:
|
|
```python
|
|
Trainer(default_save_path='/your/path/to/save/checkpoints')
|
|
```
|
|
|
|
If you need more custom behavior (different paths for both, different metrics, etc...)
|
|
from the logger and the checkpointCallback, pass in your own instances as explained below.
|
|
|
|
|
|
---
|
|
### Setting up logging
|
|
|
|
The trainer inits a default logger for you (TestTubeLogger). All logs will
|
|
go to the current working directory under a folder named ```os.getcwd()/lightning_logs``.
|
|
|
|
If you want to modify the default logging behavior even more, pass in a logger
|
|
(which should inherit from `LightningBaseLogger`).
|
|
|
|
```{.python}
|
|
my_logger = MyLightningLogger(...)
|
|
trainer = Trainer(logger=my_logger)
|
|
```
|
|
|
|
The path in this logger will overwrite default_save_path.
|
|
|
|
Lightning supports several common experiment tracking frameworks out of the box
|
|
|
|
---
|
|
#### Test tube
|
|
|
|
Log using [test tube](https://williamfalcon.github.io/test-tube/).
|
|
|
|
```{.python}
|
|
from pytorch_lightning.logging import TestTubeLogger
|
|
tt_logger = TestTubeLogger(
|
|
save_dir=".",
|
|
name="default",
|
|
debug=False,
|
|
create_git_tag=False
|
|
)
|
|
trainer = Trainer(logger=tt_logger)
|
|
```
|
|
|
|
---
|
|
#### MLFlow
|
|
|
|
Log using [mlflow](https://mlflow.org)
|
|
|
|
```{.python}
|
|
from pytorch_lightning.logging import MLFlowLogger
|
|
mlf_logger = MLFlowLogger(
|
|
experiment_name="default",
|
|
tracking_uri="file:/."
|
|
)
|
|
trainer = Trainer(logger=mlf_logger)
|
|
```
|
|
|
|
---
|
|
#### Custom logger
|
|
|
|
You can implement your own logger by writing a class that inherits from
|
|
`LightningLoggerBase`. Use the `rank_zero_only` decorator to make sure that
|
|
only the first process in DDP training logs data.
|
|
|
|
```{.python}
|
|
from pytorch_lightning.logging import LightningLoggerBase, rank_zero_only
|
|
|
|
class MyLogger(LightningLoggerBase):
|
|
|
|
@rank_zero_only
|
|
def log_hyperparams(self, params):
|
|
# params is an argparse.Namespace
|
|
# your code to record hyperparameters goes here
|
|
pass
|
|
|
|
@rank_zero_only
|
|
def log_metrics(self, metrics, step_num):
|
|
# metrics is a dictionary of metric names and values
|
|
# your code to record metrics goes here
|
|
pass
|
|
|
|
def save(self):
|
|
# Optional. Any code necessary to save logger data goes here
|
|
pass
|
|
|
|
@rank_zero_only
|
|
def finalize(self, status):
|
|
# Optional. Any code that needs to be run after training
|
|
# finishes goes here
|
|
```
|
|
|
|
If you write a logger than may be useful to others, please send
|
|
a pull request to add it to Lighting!
|
|
|
|
---
|
|
#### Using loggers
|
|
You can call the logger anywhere from your LightningModule by doing:
|
|
```python
|
|
self.logger
|
|
|
|
# add an image if using TestTubeLogger
|
|
self.logger.experiment.add_image(...)
|
|
```
|
|
|
|
|
|
#### Display metrics in progress bar
|
|
``` {.python}
|
|
# DEFAULT
|
|
trainer = Trainer(show_progress_bar=True)
|
|
```
|
|
|
|
---
|
|
#### Log metric row every k batches
|
|
Every k batches lightning will make an entry in the metrics log
|
|
``` {.python}
|
|
# DEFAULT (ie: save a .csv log file every 10 batches)
|
|
trainer = Trainer(row_log_interval=10)
|
|
```
|
|
|
|
---
|
|
#### Log GPU memory
|
|
Logs GPU memory when metrics are logged.
|
|
``` {.python}
|
|
# DEFAULT
|
|
trainer = Trainer(log_gpu_memory=None)
|
|
|
|
# log only the min/max utilization
|
|
trainer = Trainer(log_gpu_memory='min_max')
|
|
|
|
# log all the GPU memory (if on DDP, logs only that node)
|
|
trainer = Trainer(log_gpu_memory='all')
|
|
```
|
|
|
|
---
|
|
#### Process position
|
|
When running multiple models on the same machine we want to decide which progress bar to use.
|
|
Lightning will stack progress bars according to this value.
|
|
``` {.python}
|
|
# DEFAULT
|
|
trainer = Trainer(process_position=0)
|
|
|
|
# if this is the second model on the node, show the second progress bar below
|
|
trainer = Trainer(process_position=1)
|
|
```
|
|
|
|
---
|
|
#### Save a snapshot of all hyperparameters
|
|
Automatically log hyperparameters stored in the `hparams` attribute as an `argparse.Namespace`
|
|
``` {.python}
|
|
|
|
class MyModel(pl.Lightning):
|
|
def __init__(self, hparams):
|
|
self.hparams = hparams
|
|
|
|
...
|
|
|
|
args = parser.parse_args()
|
|
model = MyModel(args)
|
|
|
|
logger = TestTubeLogger(...)
|
|
t = Trainer(logger=logger)
|
|
trainer.fit(model)
|
|
```
|
|
|
|
---
|
|
#### Write logs file to csv every k batches
|
|
Every k batches, lightning will write the new logs to disk
|
|
``` {.python}
|
|
# DEFAULT (ie: save a .csv log file every 100 batches)
|
|
trainer = Trainer(log_save_interval=100)
|
|
```
|
|
|