mirror of https://github.com/BOINC/boinc.git
67 lines
2.4 KiB
HTML
67 lines
2.4 KiB
HTML
<title>Generating result retries</title>
|
|
<h2>Generating result retries</h2>
|
|
|
|
<p>
|
|
Hosts may fail to process and return results for various reasons;
|
|
such results are said to be <b>lost</b>.
|
|
A combination of lost and erroneous results may prevent
|
|
finding canonical result for a workunit.
|
|
The <b>result retry</b> mechanism generates additional
|
|
results as needed to find a canonical result.
|
|
|
|
<p>
|
|
The result retry mechanism has the following project-supplied parameters:
|
|
<ul>
|
|
<li> D<sub>WU</sub>: the expected delay (in seconds) between
|
|
creating a WU and getting a canonical result.
|
|
<li> D<sub>result</sub>: the expected delay (in seconds) between
|
|
creating a result and getting a confirmation.
|
|
<li> N<sub>Error</sub>: give up on a workunit if it gets this many error results
|
|
(i.e., there must be a bug in the application).
|
|
<li> N<sub>det</sub>: give up on a workunit if it gets this many
|
|
non-error results without finding a canonical result
|
|
(i.e., the algorithm must nondeterministic).
|
|
<li> N<sub>redundancy</sub>: try to get at least this many non-error results.
|
|
</ul>
|
|
|
|
<p>
|
|
Each workunit has a <b>retry check time</b>.
|
|
This is initially set to now + D<sub>WU</sub>,
|
|
and is set to zero if a canonical result is found for the WU.
|
|
|
|
<p>
|
|
Each result has a <b>deadline</b>,
|
|
a time by which a confirmation is expected for the result.
|
|
This is initially set to now + D<sub>result</sub>,
|
|
|
|
<p>
|
|
Retry generation is handled by the program <b>result_retry</b>, invoked as
|
|
<pre>
|
|
result_retry -appname name
|
|
</pre>
|
|
This program continually checks for workunits past their check time
|
|
and without pending validation.
|
|
For each such workunit, the program does the following:
|
|
|
|
<ul>
|
|
<li> If any result is not sent, generate an error message,
|
|
and give up on the WU (i.e., set its check time to zero).
|
|
This condition indicates that either
|
|
1) the resource requirements of the WU are too much for
|
|
any host;
|
|
2) there are insufficient hosts to handle the rate of work generation; or
|
|
3) scheduling servers have been out of service.
|
|
<li> If at least N<sub>error</sub> results have an error,
|
|
generate an error message and give up on the WU.
|
|
<li> If at least N<sub>det</sub> results are done,
|
|
generate an error message and give up on the WU.
|
|
<li> Generate N<sub>redundancy</sub> - n new results for the WU,
|
|
where n is the number of results that are done.
|
|
The deadline of these results is now + D<sub>result</sub>.
|
|
<li> Set the check time of the WU to now + D<sub>WU</sub>
|
|
|
|
</ul>
|
|
|
|
<p>
|
|
Use crontab to run <b>result_retry</b> continuously.
|