From 2c90a06fee86128a504d95e5caf0e15ad439ebac Mon Sep 17 00:00:00 2001 From: svlandeg Date: Mon, 31 Aug 2020 13:43:17 +0200 Subject: [PATCH] some more information about the loggers --- website/docs/api/top-level.md | 48 ++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/website/docs/api/top-level.md b/website/docs/api/top-level.md index 6fbb1c821..518711a8a 100644 --- a/website/docs/api/top-level.md +++ b/website/docs/api/top-level.md @@ -4,6 +4,7 @@ menu: - ['spacy', 'spacy'] - ['displacy', 'displacy'] - ['registry', 'registry'] + - ['Loggers', 'loggers'] - ['Batchers', 'batchers'] - ['Data & Alignment', 'gold'] - ['Utility Functions', 'util'] @@ -345,19 +346,26 @@ See the [`Transformer`](/api/transformer) API reference and > return span_getter > ``` - | Registry name | Description | | ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | | [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. | ## Loggers {#loggers source="spacy/gold/loggers.py" new="3"} -A logger records the training results for each step. When a logger is created, -it returns a `log_step` function and a `finalize` function. The `log_step` -function is called by the [training script](/api/cli#train) and receives a -dictionary of information, including +A logger records the training results. When a logger is created, two functions +are returned: one for logging the information for each training step, and a +second function that is called to finalize the logging when the training is +finished. To log each training step, a +[dictionary](/usage/training#custom-logging) is passed on from the +[training script](/api/cli#train), including information such as the training +loss and the accuracy scores on the development set. -# TODO +There are two built-in logging functions: a logger printing results to the +console in tabular format (which is the default), and one that also sends the +results to a [Weights & Biases`](https://www.wandb.com/) dashboard dashboard. +Instead of using one of the built-in batchers listed here, you can also +[implement your own](/usage/training#custom-code-readers-batchers), which may or +may not use a custom schedule. > #### Example config > @@ -366,10 +374,6 @@ dictionary of information, including > @loggers = "spacy.ConsoleLogger.v1" > ``` -Instead of using one of the built-in batchers listed here, you can also -[implement your own](/usage/training#custom-code-readers-batchers), which may or -may not use a custom schedule. - #### spacy.ConsoleLogger.v1 {#ConsoleLogger tag="registered function"} Writes the results of a training step to the console in a tabular format. @@ -384,14 +388,18 @@ Writes the results of a training step to the console in a tabular format. > ``` Built-in logger that sends the results of each training step to the dashboard of -the [Weights & Biases`](https://www.wandb.com/) dashboard. To use this logger, -Weights & Biases should be installed, and you should be logged in. The logger -will send the full config file to W&B, as well as various system information -such as GPU +the [Weights & Biases](https://www.wandb.com/) tool. To use this logger, Weights +& Biases should be installed, and you should be logged in. The logger will send +the full config file to W&B, as well as various system information such as +memory utilization, network traffic, disk IO, GPU statistics, etc. This will +also include information such as your hostname and operating system, as well as +the location of your Python executable. -| Name | Description | -| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- | -| `project_name` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ | +Note that by default, the full (interpolated) training config file is sent over +to the W&B dashboard. If you prefer to exclude certain information such as path +names, you can list those fields in "dot notation" in the `remove_config_values` +parameter. These fields will then be removed from the config before uploading, +but will otherwise remain in the config file stored on your local system. > #### Example config > @@ -399,8 +407,14 @@ such as GPU > [training.logger] > @loggers = "spacy.WandbLogger.v1" > project_name = "monitor_spacy_training" +> remove_config_values = ["paths.train", "paths.dev", "training.dev_corpus.path", "training.train_corpus.path"] > ``` +| Name | Description | +| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | +| `project_name` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ | +| `remove_config_values` | A list of values to include from the config before it is uploaded to W&B (default: empty). ~~List[str]~~ | + ## Batchers {#batchers source="spacy/gold/batchers.py" new="3"} A data batcher implements a batching strategy that essentially turns a stream of