mirror of https://github.com/BOINC/boinc.git
Update CondorBoinc.md file
Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
parent
7fef92cf65
commit
8bfe5dbf17
174
CondorBoinc.md
174
CondorBoinc.md
|
@ -1,4 +1,4 @@
|
|||
[[PageOutline]]
|
||||
# Condor-B: BOINC/Condor integration
|
||||
|
||||
This document describes the design of Condor-B, extensions to BOINC and Condor
|
||||
so that a BOINC-based volunteer computing project can provide computing resources to a Condor pool.
|
||||
|
@ -14,7 +14,7 @@ Condor-B must address some basic differences between Condor and BOINC:
|
|||
and the file associated with a given physical name is immutable.
|
||||
Files may be used by many jobs.
|
||||
In Condor, a file is associated with a job, and has a single name.
|
||||
* BOINC is designed for apps for which the number and names of output files
|
||||
* BOINC is designed for apps for which the number and names of output files
|
||||
is fixed at the time of job submission.
|
||||
Condor doesn't have this restriction.
|
||||
|
||||
|
@ -26,7 +26,7 @@ Condor-B must address some basic differences between Condor and BOINC:
|
|||
e.g. versions for different platforms, GPU types, etc.
|
||||
A job is associated with an application, not an app version.
|
||||
|
||||
# Assumptions
|
||||
## Assumptions
|
||||
|
||||
For simplicity, we'll assume that the BOINC project has been
|
||||
configured to run a certain set of applications
|
||||
|
@ -43,7 +43,7 @@ For each of these applications, admins must
|
|||
* Build the app for one or more platforms (ways of doing this are discussed below).
|
||||
* Create BOINC "app versions".
|
||||
|
||||
# Job submission mechanism
|
||||
## Job submission mechanism
|
||||
|
||||
We'll use Condor's existing mechanism for sending jobs to non-Condor back ends.
|
||||
This will involve 2 components:
|
||||
|
@ -53,7 +53,7 @@ This will involve 2 components:
|
|||
* A new class in Condor's job_router for managing communication
|
||||
with the BOINC GAHP.
|
||||
|
||||
[[Image(condor.png)]]
|
||||
![Image](condor.png)
|
||||
|
||||
The GAHP protocol will be based on the one used for HTCondor's interactions with Globus GRAM. That protocol's description can be found at http://research.cs.wisc.edu/htcondor/gahp/gahp_protocol.txt.
|
||||
From that protocol, we will take the basic syntax and command structure, and these commands:
|
||||
|
@ -78,28 +78,28 @@ a RESULTS command fetches the results of completed asynchronous commands.
|
|||
|
||||
The commands are:
|
||||
|
||||
## Specify BOINC project and credentials
|
||||
|
||||
BOINC_SELECT_PROJECT project_url authenticator
|
||||
|
||||
### Specify BOINC project and credentials
|
||||
```
|
||||
BOINC_SELECT_PROJECT project_url authenticator
|
||||
```
|
||||
Result (immediate): NULL or error message
|
||||
|
||||
Specify the URL of a BOINC project and the authenticator of
|
||||
an account on that project to which requests will be sent.
|
||||
|
||||
## Submit a new job batch
|
||||
|
||||
BOINC_SUBMIT <req id> <batch name> <app name>
|
||||
<#jobs>
|
||||
<job name> <#args> <arg1> <arg2> ...
|
||||
<#input files>
|
||||
<src path> <dst filename>
|
||||
...
|
||||
...
|
||||
|
||||
Result:
|
||||
NULL (success) or <err msg>
|
||||
|
||||
### Submit a new job batch
|
||||
```
|
||||
BOINC_SUBMIT <req id> <batch name> <app name>
|
||||
<#jobs>
|
||||
<job name> <#args> <arg1> <arg2> ...
|
||||
<#input files>
|
||||
<src path> <dst filename>
|
||||
...
|
||||
...
|
||||
|
||||
Result:
|
||||
NULL (success) or <err msg>
|
||||
```
|
||||
Notes:
|
||||
* The batch name and job names must be unique over all submissions.
|
||||
* Each job will have its own set of arguments and input files.
|
||||
|
@ -107,37 +107,37 @@ Notes:
|
|||
* The input <dst filename>s must agree with the app's template.
|
||||
* As of now, <dst filename> will always be the filename part
|
||||
of <src path>
|
||||
* We could add a <dir> argument to prepend to input paths.
|
||||
* We could add a \<dir> argument to prepend to input paths.
|
||||
|
||||
## Query the status of the jobs of one or more batches
|
||||
|
||||
BOINC_QUERY_BATCH <req id> min_mod_time #batches <batch name1> ...
|
||||
|
||||
Result:
|
||||
<err msg> | NULL server_time <batch size 1> <job name 1> <status1> ... <batch size 2> ...
|
||||
|
||||
### Query the status of the jobs of one or more batches
|
||||
```
|
||||
BOINC_QUERY_BATCHES <req id> min_mod_time #batches <batch name1> ...
|
||||
|
||||
Result:
|
||||
<err msg> | NULL server_time <batch size 1> <job name 1> <status1> ... <batch size 2> ...
|
||||
```
|
||||
|
||||
Query the jobs in a given set of batches.
|
||||
Only jobs whose DB record has changed (e.g. whose status has changed)
|
||||
since the given *min_mod_time* are reported
|
||||
(*min_mod_time* = 0 returns all jobs).
|
||||
since the given **min_mod_time** are reported
|
||||
(**min_mod_time** = 0 returns all jobs).
|
||||
|
||||
The output includes the current on the server;
|
||||
you can pass this as *min_mod_time* in a subsequent call.
|
||||
The output includes the current time on the server;
|
||||
you can pass this as **min_mod_time** in a subsequent call.
|
||||
|
||||
The status of each job is either IN_PROGRESS, DONE, or ERROR
|
||||
|
||||
## Retrieve the outputs of a completed job
|
||||
### Retrieve the outputs of a completed job
|
||||
|
||||
|
||||
BOINC_FETCH_OUTPUT <req id> <job name> <dir> <stderr filename>
|
||||
<mode: ALL | SOME>
|
||||
<#file-specs>
|
||||
<src name> <dst>
|
||||
...
|
||||
Result:
|
||||
error_msg | NULL <exit status> <elapsed time> <CPU time>
|
||||
|
||||
```
|
||||
BOINC_FETCH_OUTPUT <req id> <job name> <dir> <stderr filename>
|
||||
<mode: ALL | SOME>
|
||||
<#file-specs>
|
||||
<src name> <dst>
|
||||
...
|
||||
Result:
|
||||
error_msg | NULL <exit status> <elapsed time> <CPU time>
|
||||
```
|
||||
|
||||
Get the results of a completed job, including some or all of its output files.
|
||||
BOINC may replicate jobs to ensure that results are valid.
|
||||
|
@ -145,17 +145,17 @@ One replica, the "canonical instance", is designated as the authoritative result
|
|||
If the status is DONE, then the output files of the canonical instance,
|
||||
and its stderr output, are fetched.
|
||||
<exit status> will be zero in this case.
|
||||
|
||||
* <dir> is a directory on the local machine where output files are placed by default.
|
||||
|
||||
* \<dir> is a directory on the local machine where output files are placed by default.
|
||||
* If mode is ALL, all the job's output files are fetched.
|
||||
File specs are then applied to rename or move output files.
|
||||
* If mode is SOME, only those output files described by file specs are fetched.
|
||||
* Each file spec consists of <src name> and <dst>. <src_name> is a filename written by the job.
|
||||
<dst> specifies where that file should be placed on the local machine.
|
||||
* Each file spec consists of <src name> and \<dst>. \<src_name> is a filename written by the job.
|
||||
\<dst> specifies where that file should be placed on the local machine.
|
||||
It may be either:
|
||||
* An absolute path
|
||||
* A relative path, in which case <dir> is prepended.
|
||||
Any directories within <dst> must already exist.
|
||||
* An absolute path
|
||||
* A relative path, in which case \<dir> is prepended.
|
||||
Any directories within \<dst> must already exist.
|
||||
|
||||
If the status is ERROR, the BOINC GAHP looks for an instance
|
||||
for which some information is available (e.g., exit status and stderr output),
|
||||
|
@ -165,38 +165,38 @@ If there is no such instance, it returns an error message.
|
|||
or there is no consensus among the instances,
|
||||
or no instances could be dispatched.)
|
||||
|
||||
## Abort jobs
|
||||
|
||||
BOINC_ABORT_JOBS <req id> <job name> ...
|
||||
Result:
|
||||
NULL|<err msg>
|
||||
|
||||
### Abort jobs
|
||||
```
|
||||
BOINC_ABORT_JOBS <req id> <job name> ...
|
||||
Result:
|
||||
NULL|<err msg>
|
||||
```
|
||||
|
||||
## Retire a batch
|
||||
|
||||
BOINC_RETIRE_BATCH <req id> <batch name>
|
||||
Result:
|
||||
NULL|<err msg>
|
||||
|
||||
### Retire a batch
|
||||
```
|
||||
BOINC_RETIRE_BATCH <req id> <batch name>
|
||||
Result:
|
||||
NULL|<err msg>
|
||||
```
|
||||
The batch's files and database records can be deleted.
|
||||
|
||||
## Set the "lease time" for a batch
|
||||
|
||||
BOINC_SET_LEASE <req id> <batch name> <new lease time>
|
||||
Result:
|
||||
NULL|<err msg>
|
||||
|
||||
### Set the "lease time" for a batch
|
||||
```
|
||||
BOINC_SET_LEASE <req id> <batch name> <new lease time>
|
||||
Result:
|
||||
NULL|<err msg>
|
||||
```
|
||||
After this time its files and database records can be deleted.
|
||||
|
||||
## Results command
|
||||
|
||||
RESULTS
|
||||
|
||||
Result:
|
||||
# of completed commands
|
||||
<req_id1> result1
|
||||
...
|
||||
|
||||
### Results command
|
||||
```
|
||||
RESULTS
|
||||
|
||||
Result:
|
||||
# of completed commands
|
||||
<req_id1> result1
|
||||
...
|
||||
```
|
||||
|
||||
If any commands have completed, return their results.
|
||||
|
||||
|
@ -204,12 +204,12 @@ Note: the GAHP protocol defines an "async mode" where the GAHP can notify
|
|||
the grid manager that a command has completed by sending "R\n".
|
||||
This is probably not worth doing since polling is very cheap.
|
||||
|
||||
# Project selection and authentication
|
||||
## Project selection and authentication
|
||||
|
||||
For the time being we'll do it this way:
|
||||
Each job submitter has a separate account on the BOINC project
|
||||
(these accounts can be assigned [access rights and quotas](MultiUser)).
|
||||
The account has a private *authenticator* (a random string).
|
||||
The account has a private **authenticator** (a random string).
|
||||
|
||||
The job submitter will create a configuration file containing
|
||||
* the URL of the BOINC project
|
||||
|
@ -221,7 +221,7 @@ and will handle requests using that account on that project.
|
|||
Note: we could generalize this a bit by including the
|
||||
project URL and authenticator as an argument to each GAHP request.
|
||||
|
||||
# Data model
|
||||
## Data model
|
||||
|
||||
The BOINC GAHP uses BOINC's
|
||||
[content-based file management system](RemoteInputFiles#Content-basedfilemanagement)
|
||||
|
@ -235,7 +235,7 @@ a given file is used by many jobs or batches.
|
|||
The BOINC database stores records associating files and batches;
|
||||
a file is deleted only when it is no longer associated with any batches.
|
||||
|
||||
# Implementation notes
|
||||
## Implementation notes
|
||||
|
||||
The BOINC GAHP handles BOINC_SUBMIT as follows:
|
||||
|
||||
|
@ -249,15 +249,15 @@ The BOINC GAHP handles BOINC_SUBMIT as follows:
|
|||
and create batch/file associations for these files.
|
||||
* Do an RPC create jobs
|
||||
|
||||
# Ways to deploy applications on BOINC
|
||||
## Ways to deploy applications on BOINC
|
||||
|
||||
BOINC offers three "environments" in which applications can be deployed:
|
||||
* *Native*:
|
||||
* **Native**:
|
||||
This requires making source-code modifications and building the app
|
||||
for different platforms, linking with the BOINC API library.
|
||||
* *BOINC wrapper*:
|
||||
* **BOINC wrapper**:
|
||||
Requires apps to be built for different platforms, but no source code mods.
|
||||
* *Virtual machine-based*:
|
||||
* **Virtual machine-based**:
|
||||
This would eliminate multi-platform issues
|
||||
but would require volunteer hosts to have VirtualBox installed.
|
||||
but would require volunteer hosts to have [VirtualBox](VirtualBox) installed.
|
||||
|
||||
|
|
Loading…
Reference in New Issue