timeout_check -> transitioner

svn path=/trunk/boinc/; revision=2122
This commit is contained in:
David Anderson 2003-08-15 22:19:25 +00:00
parent 6085e9a817
commit f7edfd2657
7 changed files with 143 additions and 79 deletions

View File

@ -5795,3 +5795,6 @@ David Aug 15 2003
tools/
backend_lib.C,h
create_work.C
David Aug 15 2003
- changed "timeout_check" to "transitioner"

View File

@ -2,22 +2,19 @@
<body bgcolor=ffffff>
<h2>Results</h2>
<p>
A <b>result</b> describes an instance of a computation, either to be
performed, in progress, or completed.
Results are stored in the <b>result</b> table of the BOINC DB.
A <b>result</b> describes an instance of a computation, either unstarted,
in progress, or completed.
The attributes of a result include:
<ul>
<li> The name of the result (unique across all results in the project).
<li> The associated workunit.
<li> The time when the completed result should be reported to a
scheduling server.
This is assigned by the project, and is used by
clients to prioritize operations and to initiate scheduler RPCs.
There is no guarantee that the result will actually be reported by this time.
<li> An XML document listing the names of its output files; see below.
<li> The time when the result was dispatched.
<li> Its <b>state</b>. Values include:
<dl>
<dt><b>name</b><dd>
The name of the result (unique across all results in the project).
<dt><b>workunit name</b><dd>
<dt><b>output files</b><dd>
A list of the names of the output files,
and the names by which the application refers to them.
<dt><b>state</b><dd>
Values include:
<ul>
<li> Inactive (not ready to dispatch)
<li> Unsent (ready to dispatch, but not dispatched)
@ -26,7 +23,12 @@ There is no guarantee that the result will actually be reported by this time.
<li> Timed out
<li> Done with error
</ul>
</ul>
<dt><b>host</b><dd>
The host that executed the computation.
<dt><b>CPU time</b><dd>
The CPU time that was used.
<dt><b>exit status</b><dd>
</dl>
<p>
The following attributes are defined after the result is completed:
<ul>
@ -35,8 +37,6 @@ files (filled in after the result is completed).
<li> The stderr output of the result.
<li> The host that was sent the result.
<li> The times when the result was received.
<li> The exit status of the application.
<li> The reported CPU time.
</ul>
<p>
Results are normally created using the

View File

@ -29,6 +29,46 @@ create_work
infile_1 ... infile_m // input files
</pre>
<p>
The WU template file has the form
<pre>
[ &lt;file_info>...&lt;/file_info> ]
[ ... ]
&lt;workunit>
[ &lt;command_line>-flags xyz&lt;/command_line> ]
[ &lt;env_vars>name=val&amp;name=val&lt;/env_vars> ]
[ &lt;max_processing>...&lt;/max_processing> ]
[ &lt;max_disk>...&lt;/max_disk> ]
[ &lt;file_ref>...&lt;/file_ref> ]
[ ... ]
&lt;/workunit>
</pre>
The components are:
<table border=1 cellpadding=6>
<tr><td>&lt;command_line></td>
<td>The command-line arguments to be passed to the main program.
</td></tr>
<tr><td>&lt;env_vars></td>
<td>A list of environment variables in the form
name=value&name=value&name=value.
</td></tr>
<tr><td valign=top>&lt;max_processing></td>
<td>Maximum processing
(measured in <a href=credit.html>Cobblestones</a>).
An instance of the computation that exceeds this bound will be aborted.
This mechanism prevents an infinite-loop bug from
indefinitely incapacitating a host.
The default is determined by the client; typically it is 1.
</td></tr>
<tr><td>&lt;max_disk></td>
<td>Maximum disk usage (in bytes).
The default is determined by the client; typically it is 1,000,000.
</td></tr>
<tr><td>&lt;file_ref></td>
<td> describes a <a
href="files.html">reference</a> to an input file, each of which is
described by a <b>&lt;file_info></b> element.
</td></tr></table>
<p>
The workunit template file is processed as follows:
<ul>
<li>

View File

@ -2,73 +2,37 @@
<body bgcolor=ffffff>
<h2>Workunits</h2>
<p>
TODO: don't separate into XML/other parts;
describe XML format only in the tools doc.
<p>
A <b>workunit</b> describes a computation to be performed.
Workunits are maintained in the <b>workunit</b> table in the BOINC DB.
The attributes of a workunit include:
</p>
<ul>
<li> Its name (unique across all workunits in the project).
<li> Its application.
<li> An XML document describing its input files and other parameters
(see below).
<li> The estimated resource requirements of the work unit
(computation, memory, disk space).
<li> A <b>delay bound</b>: upper bound on how long (in real time)
a result associated with this work unit should take to complete.
This determines which hosts the workunit can be sent to,
and it's used to assign result deadlines and
times for retrying results.
</ul>
<p>
Some parameters of a workunit are described by an XML document of the form
<pre>
[ &lt;file_info>...&lt;/file_info> ]
[ ... ]
&lt;workunit>
[ &lt;command_line>-flags xyz&lt;/command_line> ]
[ &lt;env_vars>name=val&amp;name=val&lt;/env_vars> ]
[ &lt;max_processing>...&lt;/max_processing> ]
[ &lt;max_disk>...&lt;/max_disk> ]
[ &lt;file_ref>...&lt;/file_ref> ]
[ ... ]
&lt;/workunit>
</pre>
The components are:
<table border=1 cellpadding=6>
<tr><td>&lt;command_line></td>
<td>The command-line arguments to be passed to the main program.
</td></tr>
<tr><td>&lt;env_vars></td>
<td>A list of environment variables in the form
name=value&name=value&name=value.
</td></tr>
<tr><td valign=top>&lt;max_processing></td>
<td>Maximum processing
(measured in <a href=credit.html>Cobblestones</a>).
An instance of the computation that exceeds this bound will be aborted.
This mechanism prevents an infinite-loop bug from
indefinitely incapacitating a host.
The default is determined by the client; typically it is 1.
</td></tr>
<tr><td>&lt;max_disk></td>
<td>Maximum disk usage (in bytes).
The default is determined by the client; typically it is 1,000,000.
</td></tr>
<tr><td>&lt;file_ref></td>
<td> describes a <a
href="files.html">reference</a> to an input file, each of which is
described by a <b>&lt;file_info></b> element.
</td></tr></table>
<p>
<dl>
<dt><b>name</b>
<dd>
Unique across all workunits in the project.
<dt><b>application</b>
<dd>
Which application performs the computation.
A workunit is associated with an application, not with a particular
version or range of versions.
If the format of your input data changes in
a way that is incompatible with older versions,
you must create a new application.
This can often be avoided by using XML data format.
<dt><b>input files</b>
<dd>
A list of its input files: their names,
and the names by which the application refers to them.
<dt><b>resource estimates</b>
<dd> The estimated resource requirements of the work unit
(computation, memory, disk space).
<dt><b>scheduling parameters</b>
<dd>
A <a href=wu_sched_params.html> set of parameters</a>
determining the redundancy and error policies for this work unit.
</dl>
<p>
<p>
The <a href="tools_work.html">create_work</a> utility program provides a
simplified interface for creating workunits.

57
doc/wu_sched_params.html Normal file
View File

@ -0,0 +1,57 @@
<h2>Workunit scheduling parameters</h2
<p>
BOINC
<p>
Each workunit has several parameters related to redundancy and scheduling.
Values for these parameters are supplied by the project
when the workunit is created
<dl>
<dt>
<b>delay_bound</b>
<dd>
An upper bound on the time (in seconds) between sending
a result to a client and receiving a reply.
The scheduler won't issue a result if the estimated
completion time exceeds this.
Set this to several times the average execution time
of a workunit on a typical PC.
If you set it too low,
BOINC may not be able to send some results,
and the corresponding workunit will be flagged with an error.
If you set it too high,
there may a corresponding delay in getting results back.
<dt>
<b>min_quorum</b>
<dd>
The minimum size of a quorum.
Set this to two or more if you want redundant computing.
<dt>
<b>target_nresults</b>
<dd>
How many successful results to get.
This must be at least <b>min_quorum</b>.
It may be more to reflect the ratio of result loss,
or to get a quorum more quickly.
<dt>
<b>max_error_results</b>
<dd>
If the number of client error results exceeds this,
the work unit is declared to have an error;
no further results are issued, and the assimilator is triggered.
This safeguards against workunits that exercise a bug
in the application.
<dt>
<b>max_total_results</b>
<dd>
If the total number of results for this workunit exceeds this,
the workunit is declared to be in error.
<dt>
<b>max_success_results</b>
<dd>
If the number of success results for this workunit exceeds this,
and a consensus has not been reached,
the workunit is declared to be in error.
</dl>

View File

@ -4,7 +4,7 @@ include $(top_srcdir)/Makefile.incl
noinst_PROGRAMS = \
cgi feeder show_shmem file_upload_handler \
validate_test validate_trivial make_work timeout_check file_deleter \
validate_test validate_trivial make_work transitioner file_deleter \
assimilator db_dump update_stats
noinst_LIBRARIES = libsched.a
@ -83,9 +83,9 @@ make_work_SOURCES = make_work.C
make_work_DEPENDENCIES = $(LIBRSA) $(LIB_SCHED)
make_work_LDADD = $(LDADD) $(RSA_LIBS)
timeout_check_SOURCES = timeout_check.C
timeout_check_DEPENDENCIES = $(LIBRSA) $(LIB_SCHED)
timeout_check_LDADD = $(LDADD) $(RSA_LIBS)
transitioner_SOURCES = transitioner.C
transitioner_DEPENDENCIES = $(LIBRSA) $(LIB_SCHED)
transitioner_LDADD = $(LDADD) $(RSA_LIBS)
fcgi_SOURCES = \
handle_request.C \