mirror of https://github.com/BOINC/boinc.git
585 lines
20 KiB
XML
585 lines
20 KiB
XML
<?xml version="1.0"?>
|
|
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
|
|
<sect1 id="boinc">
|
|
<title>BOINC</title>
|
|
|
|
<sect2>
|
|
<title>Terminology</title>
|
|
|
|
<para>
|
|
The term <emphasis>workunit</emphasis> means approximately the same thing
|
|
for DC-API as for BOINC.
|
|
</para>
|
|
|
|
<para>
|
|
Both the DC-API and BOINC uses the term <emphasis>result</emphasis>, but
|
|
they mean different things. In BOINC, results are instances of a work
|
|
unit waiting to be downloaded or currently under execution. The DC-API
|
|
result is what BOINC calls the <emphasis>canonical result</emphasis>.
|
|
This means that when BOINC generates multiple results (e.g. for
|
|
redundant computation), the DC-API will not be notified about the status
|
|
of individual BOINC results; instead, it will be notified only if the
|
|
canonical result is found or the whole work unit is marked as failed by
|
|
BOINC.
|
|
</para>
|
|
|
|
<para>
|
|
In the following sections, <emphasis>result</emphasis> will mean the
|
|
DC-API term, while <emphasis>BOINC result</emphasis> will refer to the
|
|
BOINC definition.
|
|
</para>
|
|
|
|
<para>
|
|
The DC-API master application is an <emphasis>assimilator</emphasis> in
|
|
BOINC terms.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Configuration options</title>
|
|
|
|
<sect3>
|
|
<title>Master side</title>
|
|
|
|
<warning>
|
|
Important note: the directories specified by
|
|
<literal>WorkingDirectory</literal>, <literal>ProjectRootDir</literal>
|
|
and the upload & download directories specified in BOINC's
|
|
<filename>config.xml</filename> must all reside on the same filesystem
|
|
since the DC-API uses the <function>link()</function> and
|
|
<function>rename()</function> system calls.
|
|
</warning>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>InstanceUUID</term>
|
|
<listitem>
|
|
<para>
|
|
REQUIRED. The value must be a Universally Unique Identifier. The
|
|
value must be unique for every master application running on the
|
|
same grid backend. If two master applications are started with the
|
|
same <literal>InstanceUUID</literal> value, their behaviour is
|
|
undefined.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>BoincConfigXML</term>
|
|
<listitem>
|
|
<para>
|
|
REQUIRED. The location of the BOINC <filename>config.xml</filename>
|
|
configuration file.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>ProjectRootDir</term>
|
|
<listitem>
|
|
<para>
|
|
REQUIRED. The location of the project's root directory. This
|
|
directory. This is the directory that contains the
|
|
<filename>templates</filename>, <filename>upload</filename>,
|
|
<filename>download</filename> and other BOINC-related
|
|
subdirectories.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Per-client configuration</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>Redundancy</term>
|
|
<listitem>
|
|
<anchor id="DC-API-Boinc-Redundancy"/>
|
|
<para>
|
|
OPTIONAL. Integer value specifying the quorum required to consider
|
|
the work unit as valid. The default value is 1. If this value is N,
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
N + log(N) initial BOINC results will be created. If one of
|
|
them finishes, a new one will be created automatically until
|
|
the work unit either succeeds or fails.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The work unit will be considered failed if more than N +
|
|
log(N + 2) + 1 BOINC results fail.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The work unit will be considered failed if there are N +
|
|
log(N + 2) + 1 successful results but the validator could
|
|
not find a canonical result.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The work unit will be considered failed if the state of the
|
|
work unit is still not decided after 2 * (N + log(N + 2))
|
|
BOINC results have been received.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<note>
|
|
When the redundancy is greater than 1, the work unit can not be
|
|
suspended using <function><link
|
|
linkend="DC-suspendWU">DC_suspendWU()</link></function>.
|
|
</note>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>MaxOutputSize</term>
|
|
<listitem>
|
|
<para>
|
|
OPTIONAL. Max. size of any output files the client application
|
|
generates. The default is 256 KiB. If the size of an output file
|
|
exceeds this value, the BOINC core client will not upload that
|
|
file and will report the BOINC result as failed.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>MaxMemUsage</term>
|
|
<listitem>
|
|
<para>
|
|
OPTIONAL. Max. memory usage of the client application. The default
|
|
is 128 MiB. Hosts with less available memory will not download
|
|
work units for this application. Also, if the applications's real
|
|
memory usage exceeds this limit, the BOINC core client aborts the
|
|
application and reports the BOINC result as failed.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>MaxDiskUsage</term>
|
|
<listitem>
|
|
<para>
|
|
OPTIONAL. Max. disk usage of the client application, including all
|
|
output and temporary files. The default is 64 MiB. Hosts with less
|
|
usable disk space will not download work units for this
|
|
application. Also, if the application's disk usage exceeds this
|
|
limit, the BOINC core client aborts the apllication and reports
|
|
the BOINC result as failed.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>EstimatedFPOps</term>
|
|
<listitem>
|
|
<para>
|
|
OPTIONAL. The estimated run-time of the client application,
|
|
expressed in the number of floating point operations. The default
|
|
is 10<superscript>13</superscript>. This value is used by the
|
|
BOINC server to decide whether a given host is eligible to run a
|
|
work unit and is also used by the BOINC core client for scheduling
|
|
decisions.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>MaxFPOps</term>
|
|
<listitem>
|
|
<para>
|
|
OPTIONAL. Max. CPU usage of the client application, expressed in the number
|
|
of floating point operations. The default is
|
|
10<superscript>15</superscript>. If the application uses more CPU
|
|
time than this value divided by the CPU's speed, then the BOINC
|
|
core client aborts the application and reports the BOINC result as
|
|
failed.
|
|
</para>
|
|
<note>
|
|
As per recommendations in the BOINC documentation, the value of
|
|
<literal>MaxFPOps</literal> should be several times larger than
|
|
the expected run time of a work unit on an avarage host.
|
|
</note>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>DelayBound</term>
|
|
<listitem>
|
|
<para>
|
|
Time in seconds the BOINC server waits for a result to finish. If
|
|
a client has donwloaded a BOINC result and did not finish in the
|
|
given time, the result is considered failed and a new one is
|
|
generated.
|
|
</para>
|
|
<note>
|
|
If <literal>DelayBound</literal> is smaller than the estimated run
|
|
time of the application on a given host (calculated by dividing
|
|
<literal>EstimatedFPOps</literal> by the host's speed), then the
|
|
BOINC result will not be offered for download. If no host is fast
|
|
enough to complete the application within the specified time
|
|
limit, the result will remain unsent for an unspecified amount of
|
|
time and DC-API will receive no feedback for it.
|
|
</note>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>EnableSuspend</term>
|
|
<listitem>
|
|
<para>
|
|
OPTIONAL. Boolean value telling if work units for this client can
|
|
be suspended using <function><link
|
|
linkend="DC-suspendWU">DC_suspendWU()</link></function> or
|
|
not. The default value is true.
|
|
<note>
|
|
When the redundancy is greater than 1, the work unit can not be
|
|
suspended using <function><link
|
|
linkend="DC-suspendWU">DC_suspendWU()</link></function>,
|
|
regardless of the value of this configuration option.
|
|
</note>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<anchor id="Boinc-Config">
|
|
<title>Considerations for BOINC configuration</title>
|
|
</anchor>
|
|
|
|
<para>
|
|
If you want to use master-to-client messaging, you must enable it in the
|
|
BOINC project's configuration by making sure that the
|
|
<literal><msg_to_host/></literal> tag is present in
|
|
<filename>config.xml</filename>. Client-to-master messaging is always
|
|
enabled and does not require configuration.
|
|
</para>
|
|
</sect3>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Backend-specific issues</title>
|
|
|
|
<sect3>
|
|
<title>Deploying the application</title>
|
|
|
|
<para>
|
|
Deploying the application consists of two steps: registering the client
|
|
application(s) in the BOINC database, and running the master daemon.
|
|
</para>
|
|
<para>
|
|
All client applications should be compiled for every platform you need,
|
|
and installed under the project's <filename>apps</filename> directory.
|
|
The BOINC name of the client application must be the same as the master
|
|
uses when it calls <function><link
|
|
linkend="DC-createWU">DC_createWU()</link></function>. See the BOINC
|
|
documentation about how the client binaries should be named and placed
|
|
and how they should be registered in the database.
|
|
</para>
|
|
<para>
|
|
The most common method of deploying the master application is to run it
|
|
as a BOINC daemon by adding it to BOINC's
|
|
<filename>config.xml</filename>. See the BOINC documentation for
|
|
details. Other methods of deploying the master application depending on
|
|
how it was designed are also possible, but the following rules must be
|
|
fulfilled:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The master application must have access to the BOINC project's
|
|
<filename>config.xml</filename>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The BOINC <application>file_deleter</application> process must
|
|
have enough privileges to be able to remove files and directories
|
|
created by the master application. If the master runs under the
|
|
same user account as the BOINC daemons, this is usually not a
|
|
problem.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The master application must be able to create files and
|
|
directories under the project's <filename>download</filename>
|
|
directory, and it must be able to access files under the project's
|
|
<filename>upload</filename> directory.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<!-- XXX Update when the validator API is added -->
|
|
<para>
|
|
Besides the master and client applications, you must also define a
|
|
validator for the application in <filename>config.xml</filename>. If you
|
|
are not using redundancy then you may use the
|
|
<filename>sample_trivial_validator</filename> that comes with BOINC.
|
|
This validator accepts everything without checking.
|
|
</para>
|
|
<para>
|
|
If redundancy is desired, you may use the
|
|
<application>validator_for_dcapi</application> validator which does a
|
|
textual (meaning converting between UNIX and Windows line endings)
|
|
comparison of the first output file.
|
|
</para>
|
|
<warning>
|
|
If you are running multiple master applications under the same BOINC
|
|
project, and you want to use
|
|
<filename>sample_trivial_validator</filename> for any of them, then
|
|
you must use it for all of them. This restricition exists for any
|
|
other validator that is not DC-API aware, since it can not determine
|
|
which work unit belongs to which master and therefore which results
|
|
should it validate and which ones should it leave alone.
|
|
</warning>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Redundant computation</title>
|
|
|
|
<para>
|
|
Redundancy is very important if you are running computations on
|
|
untrusted clients instead but may even be useful on dedicated clients to
|
|
protect from hardware failures. Besides deliberate tampering with the
|
|
output, clients may also produce incorrect results due to hardware
|
|
problems like bad memory, overheating or faulty CPU or simply disk
|
|
corruption.
|
|
</para>
|
|
<para>
|
|
Redundant computing means sending the same work unit to multiple
|
|
different clients and comparing the results. The comparison is performed
|
|
by a tool BOINC calls <emphasis>validator</emphasis>. The validator
|
|
usually is application-specific as it must understand the output file
|
|
format to filter out unimportant noise (like different line endings on
|
|
different operating systems, or small differences between floating point
|
|
results due to the different rounding characteristics of different CPU
|
|
architectures).
|
|
</para>
|
|
<para>
|
|
Redundancy can be enabled in DC-API on a per client application basis by
|
|
adding the appropriate <link
|
|
linkend="DC-API-Boinc-Redundancy"><literal>Redundancy</literal></link>
|
|
value to the client's configuration group.
|
|
</para>
|
|
<para>
|
|
If redundancy is enabled for a client application, work units for that
|
|
client can not be suspended. The reason that it is generally impossible
|
|
to compare the state of two BOINC results suspended at two different
|
|
stage of their execution. If one of the suspended results is already
|
|
corrupt and is restarted, the validator will no longer recognize the
|
|
corruption since all new results starting from the corrupted initial
|
|
state will produce the same but bad output.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Messaging</title>
|
|
|
|
<para>
|
|
BOINC provides a limited messaging support that is accessible thru the
|
|
DC-API <function><link
|
|
linkend="DC-sendWUMessage">DC_sendWUMessage()</link></function> and
|
|
<function><link
|
|
linkend="DC-sendMessage">DC_sendMessage()</link></function>
|
|
functions on the master and client side, respectively.
|
|
<note>
|
|
See the note about <link linkend="Boinc-Config">configuration</link>
|
|
requirements for master-to-client messaging.
|
|
</note>
|
|
</para>
|
|
<para>
|
|
BOINC messaging has several restrictions:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Messages can only be sent to BOINC results that are currently
|
|
running. If a work unit has no running result, messages sent to it
|
|
are silently discarded.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If redundancy is enabled, master-to-client messages are sent to
|
|
all running BOINC results regardless their state. In case of
|
|
client-to-master messages, the master cannot tell which BOINC
|
|
result sent the message. This means that "request-response" style
|
|
messaging is hard to implement correctly when redundancy is
|
|
enabled.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Messages sent by the master are delivered only when the client
|
|
connects to the master next time. Since the master has no control
|
|
over this, the client should periodically send messages to the
|
|
master to force a connection if timely receiving of messages sent
|
|
by the master is important. Be caraful about the extra load placed
|
|
on the BOINC server by clients sending messages too frequently.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When multiple messages are being queued in either direction due to
|
|
the client not connecting to the server frequently enough, they
|
|
will be delivered to the peer in an undefined order.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Cancelling a running work unit</title>
|
|
|
|
<para>
|
|
The <function><link
|
|
linkend="DC-cancelWU">DC_cancelWU()</link></function> function can
|
|
be used to cancel a running work unit. This function is implemented
|
|
by sending a special message to all running BOINC results. This implies
|
|
that unless clients where BOINC results for this work unit are running
|
|
connect back to the BOINC server, the cancel request may not be
|
|
delivered until the client finishes the computation.
|
|
</para>
|
|
<para>
|
|
Due to a race condition between various components of the BOINC system,
|
|
it is also possible that a new BOINC result is created and is sent out
|
|
after the work unit has been cancelled. Such BOINC result will not
|
|
receive the cancellation request and will run until it finishes its
|
|
computation. Its result however will not be reported to the DC-API
|
|
master, so the application should not be concerned about this.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<anchor id="Boinc-Result">
|
|
<title>When is a result reported</title>
|
|
</anchor>
|
|
|
|
<!-- XXX Rework when the validator API is merged -->
|
|
<para>
|
|
The BOINC core client handles the completion of a BOINC result in two
|
|
phases: first it uploads the output files, then it notifies the BOINC
|
|
server that the result has been finised. The validator will notice the
|
|
completion of the BOINC result only when this notification is received.
|
|
</para>
|
|
<para>
|
|
However, this notification is sent only when the core client has to
|
|
connect the BOINC server the next time, which may be a long time if the
|
|
core client has already started processing the next BOINC result while
|
|
the output files of the previous result were being uploaded.
|
|
</para>
|
|
<para>
|
|
When there are no more work units to download, the client sleeps for a
|
|
couple of minutes before trying again. This means that the reporting of
|
|
the completion of the last work unit may be delayed for a couple of
|
|
minutes even after all its output files have been uploaded.
|
|
</para>
|
|
<para>
|
|
The DC-API master application will receive notification about a result
|
|
when the validator has made its decision. This may also introduce some
|
|
delay after all BOINC results have been completed.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Work unit priority</title>
|
|
|
|
<para>
|
|
The priority of a work unit can be set either by using the
|
|
<function><link
|
|
linkend="DC-setWUPriority">DC_setWUPriority()</link></function>
|
|
function or by specifying it in the configuration file using the
|
|
<literal>DefaultPriority</literal> key. Either way, the priority can
|
|
be an arbitrary 32-bit integer.
|
|
</para>
|
|
<para>
|
|
The BOINC scheduler dispatches higher priority work units first. Results
|
|
belonging to work units with lower priorities will not be offered to
|
|
clients until all the higher priority work units are exhausted.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Common errors</title>
|
|
|
|
<para>
|
|
There are some common errors:
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>No results are reported</term>
|
|
<listitem>
|
|
<para>
|
|
Check the validator. When there is no validator defined in
|
|
<filename>config.xml</filename> or the validator fails for some
|
|
reason, the DC-API master will not receive result notifications.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>The final result is not reported</term>
|
|
<listitem>
|
|
<para>
|
|
There is a couple minutes <link
|
|
linkend="Boinc-Result">delay</link> before reporting the final
|
|
result. It is normal.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>
|
|
I've fixed a bug in a client application, but results are still
|
|
computed using the old client
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
Be sure to give the new client binary with a version number
|
|
greater than the old client, or otherwise the clients will not
|
|
notice that the binary has been updated and will not download
|
|
it.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Open issues</title>
|
|
|
|
<para>
|
|
The following list contains the known problems with DC-API's BOINC
|
|
backend:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Messages are not removed from the <literal>msg_to_host</literal>
|
|
and <literal>msg_from_host</literal> tables by the
|
|
<application>db_purge</application> tool, so they need to be cleaned
|
|
up manually from time to time to prevent the database from being
|
|
filled up.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The DC-API creates result template files in the
|
|
<filename>templates</filename> subdirectory in the project's root
|
|
directory, but those files are never removed.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect3>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
<!-- vim: set ai sw=2 tw=80: -->
|