Generating result retries

Hosts may fail to process and return results for various reasons; such results are said to be lost. A combination of lost and erroneous results may prevent finding canonical result for a workunit. The result retry mechanism generates additional results as needed to find a canonical result.

The result retry mechanism has the following project-supplied parameters:

D_WU: the expected delay (in seconds) between creating a WU and getting a canonical result.
D_result: the expected delay (in seconds) between creating a result and getting a confirmation.
N_Error: give up on a workunit if it gets this many error results (i.e., there must be a bug in the application).
N_det: give up on a workunit if it gets this many non-error results without finding a canonical result (i.e., the algorithm must nondeterministic).
N_redundancy: try to get at least this many non-error results.

Each workunit has a retry check time. This is initially set to now + D_WU, and is set to zero if a canonical result is found for the WU.

Each result has a deadline, a time by which a confirmation is expected for the result. This is initially set to now + D_result,

Retry generation is handled by the program result_retry, invoked as

result_retry -appname name

This program continually checks for workunits past their check time and without pending validation. For each such workunit, it does the following:

If any result is not sent, it generates a project warning, and gives up on the WU (i.e., sets its check time to zero).
If at least N_error results have an error, generate a project warning and give up on the WU.
If at least N_det results are done, generate a project warning and give up on the WU.
Generate N_redundancy - n new results for the WU, where n is the number of results that are done. The deadline of these results is now + D_result.
Set the check time of the WU to now + D_WU

You should use crontab to make sure that result_retry is always running.