BOINCTerminology
The term workunit means approximately the same thing
for DC-API as for BOINC.
Both the DC-API and BOINC uses the term result, but
they mean different things. In BOINC, results are instances of a work
unit waiting to be downloaded or currently under execution. The DC-API
result is what BOINC calls the canonical result.
This means that when BOINC generates multiple results (e.g. for
redundant computation), the DC-API will not be notified about the status
of individual BOINC results; instead, it will be notified only if the
canonical result is found or the whole work unit is marked as failed by
BOINC.
In the following sections, result will mean the
DC-API term, while BOINC result will refer to the
BOINC definition.
The DC-API master application is an assimilator in
BOINC terms.
Configuration optionsMaster side
Important note: the directories specified by
WorkingDirectory,
ProjectRootDir and the upload & download
directories specified in BOINC's config.xml
must all reside on the same filesystem since the DC-API uses the
link() and rename() system
calls.
InstanceUUID
REQUIRED. The value must be a Universally Unique Identifier. The
value must be unique for every master application running on the
same grid backend. If two master applications are started with the
same InstanceUUID value, their behaviour is
undefined.
ProjectRootDir
OPTIONAL. The location of the project's root directory.
This is the directory that contains
config.xml and other BOINC-related
subdirectories.
UploadURL
OPTIONAL. The upload handler's URL to send output files of
processed BOINC workunits to.
InputURLRewriteRegExpMatch
OPTIONAL. This variable along with the InputURLRewriteRegExpReplace
varaible can be used to rewrite input file URLs based on regular expressions. The
variable InputURLRewriteRegExpMatch defines the match part of the
regular expression, whereas the varibale InputURLRewriteRegExpReplace
defines the replacement part of the regular expression. An example value of this variable
is attic://([^/]*).*/([^/]*)$.
InputURLRewriteRegExpReplace
OPTIONAL. This variable along with the InputURLRewriteRegExpMatch
varaible can be used to rewrite input file URLs based on regular expressions. The
variable InputURLRewriteRegExpReplace defines the replace part of the
regular expression, whereas the varibale InputURLRewriteRegExpMatch
defines the match part of the regular expression. An example value of this variable
is http://\1/dl/redir/\2\nhttp://localhost:12345/data/\2.
Per-client configurationRedundancy
OPTIONAL. Integer value specifying the quorum required to consider
the work unit as valid. The default value is 1. If this value is N,
N + log(N) initial BOINC results will be created. If one of
them finishes, a new one will be created automatically until
the work unit either succeeds or fails.
The work unit will be considered failed if more than N +
log(N + 2) + 1 BOINC results fail.
The work unit will be considered failed if there are N +
log(N + 2) + 1 successful results but the validator could
not find a canonical result.
The work unit will be considered failed if the state of the
work unit is still not decided after 2 * (N + log(N + 2))
BOINC results have been received.
When the redundancy is greater than 1, the work unit can not
be suspended using DC_suspendWU().
In the following the options are listed which allow fine tuning
redundancy. These options are mutually exclusive with Redundancy.
MinQuorum
OPTIONAL. Integer value specifying the quorum required to consider
the work unit as valid. The default value is 1.
This option is mutually exclusive with Redundancy.
MinQuorum, TargetNResults,
MaxErrorResults and MaxTotalResults
should be used combined.
TargetNResults
OPTIONAL. Integer value specifying the number of initial BOINC results to be
created. The default value is MinQuorum.
This option is mutually exclusive with Redundancy.
MinQuorum, TargetNResults,
MaxErrorResults and MaxTotalResults
should be used combined.
MaxErrorResults
OPTIONAL. Integer value specifying the maximum number of failed BOINC results
for a work unit. The default value is 0.
This option is mutually exclusive with Redundancy.
MinQuorum, TargetNResults,
MaxErrorResults and MaxTotalResults
should be used combined.
MaxTotalResults
OPTIONAL. Integer value specifying the total number of BOINC results for a
work unit. The default value is MinQuorum.
This option is mutually exclusive with Redundancy.
MinQuorum, TargetNResults,
MaxErrorResults and MaxTotalResults
should be used combined.
MaxSuccessResults
OPTIONAL. Integer value specifying the maximum number of successful BOINC results
for a work unit. The default value is MinQuorum.
This option is mutually exclusive with Redundancy.
MinQuorum, TargetNResults,
MaxErrorResults and MaxTotalResults
should be used combined.
MaxOutputSize
OPTIONAL. Max. size of any output files the client application
generates. The default is 256 KiB. If the size of an output file
exceeds this value, the BOINC core client will not upload that
file and will report the BOINC result as failed.
MaxMemUsage
OPTIONAL. Max. memory usage of the client application. The default
is 128 MiB. Hosts with less available memory will not download
work units for this application. Also, if the applications's real
memory usage exceeds this limit, the BOINC core client aborts the
application and reports the BOINC result as failed.
MaxDiskUsage
OPTIONAL. Max. disk usage of the client application, including all
output and temporary files. The default is 64 MiB. Hosts with less
usable disk space will not download work units for this
application. Also, if the application's disk usage exceeds this
limit, the BOINC core client aborts the apllication and reports
the BOINC result as failed.
EstimatedFPOps
OPTIONAL. The estimated run-time of the client application,
expressed in the number of floating point operations. The default
is 1013. This value is used by the
BOINC server to decide whether a given host is eligible to run a
work unit and is also used by the BOINC core client for scheduling
decisions.
MaxFPOps
OPTIONAL. Max. CPU usage of the client application, expressed in
the number of floating point operations. The default is
1015. If the application uses more CPU
time than this value divided by the CPU's speed, then the BOINC
core client aborts the application and reports the BOINC result as
failed.
As per recommendations in the BOINC documentation, the value of
MaxFPOps should be several times larger than
the expected run time of a work unit on an avarage host.
DelayBound
Time in seconds the BOINC server waits for a result to finish. If
a client has donwloaded a BOINC result and did not finish in the
given time, the result is considered failed and a new one is
generated.
If DelayBound is smaller than the estimated
run time of the application on a given host (calculated by
dividing EstimatedFPOps by the host's speed),
then the BOINC result will not be offered for download. If no
host is fast enough to complete the application within the
specified time limit, the result will remain unsent for an
unspecified amount of time and DC-API will receive no feedback
for it.
EnableSuspend
OPTIONAL. Boolean value telling if work units for this client can
be suspended using DC_suspendWU() or
not. The default value is false.
When the redundancy is greater than 1, the work unit can not
be suspended using DC_suspendWU(),
regardless of the value of this configuration option.
NativeClient
OPTIONAL. Boolean value telling if the client application uses the
native BOINC API instead of DC-API. This will prevent adding
DC-API specific input and output files to the workunit
description.
Considerations for BOINC configuration
If you want to use master-to-client messaging, you must enable it in the
BOINC project's configuration by making sure that the
<msg_to_host/> tag is present in
config.xml. Client-to-master messaging is always
enabled and does not require configuration.
Backend-specific issuesDeploying the application
Deploying the application consists of two steps: registering the client
application(s) in the BOINC database, and running the master daemon.
All client applications should be compiled for every platform you need,
and installed under the project's apps directory.
The BOINC name of the client application must be the same as the master
uses when it calls DC_createWU(). See the BOINC
documentation about how the client binaries should be named and placed
and how they should be registered in the database.
The most common method of deploying the master application is to run it
as a BOINC daemon by adding it to BOINC's
config.xml. See the BOINC documentation for
details. Other methods of deploying the master application depending on
how it was designed are also possible, but the following rules must be
fulfilled:
The master application must have access to the BOINC project's
config.xml
The BOINC file_deleter process must
have enough privileges to be able to remove files and directories
created by the master application. If the master runs under the
same user account as the BOINC daemons, this is usually not a
problem.
The master application must be able to create files and
directories under the project's download
directory, and it must be able to access files under the project's
upload directory.
Besides the master and client applications, you must also define a
validator for the application in config.xml. If you
are not using redundancy then you may use the
sample_trivial_validator that comes with BOINC.
This validator accepts everything without checking.
If redundancy is desired, you may use the
validator_for_dcapi validator which does a
textual (meaning converting between UNIX and Windows line endings)
comparison of the first output file.
If you are running multiple master applications under the same BOINC
project, and you want to use
sample_trivial_validator for any of them, then
you must use it for all of them. This restricition exists for any
other validator that is not DC-API aware, since it can not determine
which work unit belongs to which master and therefore which results
should it validate and which ones should it leave alone.
Redundant computation
Redundancy is very important if you are running computations on
untrusted clients instead but may even be useful on dedicated clients to
protect from hardware failures. Besides deliberate tampering with the
output, clients may also produce incorrect results due to hardware
problems like bad memory, overheating or faulty CPU or simply disk
corruption.
Redundant computing means sending the same work unit to multiple
different clients and comparing the results. The comparison is performed
by a tool BOINC calls validator. The validator
usually is application-specific as it must understand the output file
format to filter out unimportant noise (like different line endings on
different operating systems, or small differences between floating point
results due to the different rounding characteristics of different CPU
architectures).
Redundancy can be enabled in DC-API on a per client application basis by
adding the appropriate Redundancy
value to the client's configuration group.
If redundancy is enabled for a client application, work units for that
client can not be suspended. The reason that it is generally impossible
to compare the state of two BOINC results suspended at two different
stage of their execution. If one of the suspended results is already
corrupt and is restarted, the validator will no longer recognize the
corruption since all new results starting from the corrupted initial
state will produce the same but bad output.
Messaging
BOINC provides a limited messaging support that is accessible thru the
DC-API DC_sendWUMessage() and
DC_sendMessage()
functions on the master and client side, respectively.
See the note about configuration
requirements for master-to-client messaging.
BOINC messaging has several restrictions:
Messages can only be sent to BOINC results that are currently
running. If a work unit has no running result, messages sent to it
are silently discarded.
If redundancy is enabled, master-to-client messages are sent to
all running BOINC results regardless their state. In case of
client-to-master messages, the master cannot tell which BOINC
result sent the message. This means that "request-response" style
messaging is hard to implement correctly when redundancy is
enabled.
Messages sent by the master are delivered only when the client
connects to the master next time. Since the master has no control
over this, the client should periodically send messages to the
master to force a connection if timely receiving of messages sent
by the master is important. Be caraful about the extra load placed
on the BOINC server by clients sending messages too frequently.
When multiple messages are being queued in either direction due to
the client not connecting to the server frequently enough, they
will be delivered to the peer in an undefined order.
Cancelling a running work unit
The DC_cancelWU() function can
be used to cancel a running work unit. This function is implemented
by sending a special message to all running BOINC results. This implies
that unless clients where BOINC results for this work unit are running
connect back to the BOINC server, the cancel request may not be
delivered until the client finishes the computation.
Due to a race condition between various components of the BOINC system,
it is also possible that a new BOINC result is created and is sent out
after the work unit has been cancelled. Such BOINC result will not
receive the cancellation request and will run until it finishes its
computation. Its result however will not be reported to the DC-API
master, so the application should not be concerned about this.
When is a result reported
The BOINC core client handles the completion of a BOINC result in two
phases: first it uploads the output files, then it notifies the BOINC
server that the result has been finised. The validator will notice the
completion of the BOINC result only when this notification is received.
However, this notification is sent only when the core client has to
connect the BOINC server the next time, which may be a long time if the
core client has already started processing the next BOINC result while
the output files of the previous result were being uploaded.
When there are no more work units to download, the client sleeps for a
couple of minutes before trying again. This means that the reporting of
the completion of the last work unit may be delayed for a couple of
minutes even after all its output files have been uploaded.
The DC-API master application will receive notification about a result
when the validator has made its decision. This may also introduce some
delay after all BOINC results have been completed.
Work unit priority
The priority of a work unit can be set either by using the
DC_setWUPriority()
function or by specifying it in the configuration file using the
DefaultPriority key. Either way, the priority can
be an arbitrary 32-bit integer.
The BOINC scheduler dispatches higher priority work units first. Results
belonging to work units with lower priorities will not be offered to
clients until all the higher priority work units are exhausted.
Common errors
There are some common errors:
No results are reported
Check the validator. When there is no validator defined in
config.xml or the validator fails for some
reason, the DC-API master will not receive result notifications.
The final result is not reported
There is a couple minutes delay before reporting the final
result. It is normal.
I've fixed a bug in a client application, but results are still
computed using the old client
Be sure to give the new client binary with a version number
greater than the old client, or otherwise the clients will not
notice that the binary has been updated and will not download
it.
Open issues
The following list contains the known problems with DC-API's BOINC
backend:
Messages are not removed from the msg_to_host
and msg_from_host tables by the
db_purge tool, so they need to be cleaned
up manually from time to time to prevent the database from being
filled up.
The DC-API creates result template files in the
templates subdirectory in the project's root
directory, but those files are never removed.