5 JobSubmission
David Anderson edited this page 2024-01-11 12:17:08 -08:00

Submitting jobs locally

On the command line

create_work is a command-line tool for submitting jobs. Run it from the project root directory.

create_work [ arguments ] infile_1 ... infile_n

Create a job with the given input files (which must already be staged).

Mandatory arguments:

--appname name

application name

Optional arguments:

--wu_name name

workunit name (default: appname_PID_TIME)

--wu_template filename

Input template filename relative to project root; usually in templates/. Default: templates/appname_in.

--result_template filename

Output template filename, relative to project root; usually in templates/. Default: templates/appname_out.

--batch n

associate the job with the given batch.

--priority n

high values are assigned before low values; see --priority_order and --priority_order_create_time options on the feeder

--stdin

Read stdin, and create one job per line (see below).

--continue_on_error

keep going if an error occurs (used with --stdin)

--broadcast, --target_host, etc.

Assign or broadcast this job.

--keywords "1 2 3"

list of keyword IDs

--app_version_num N

process the job with version N app versions.

The following job parameters may be passed in the input template, or as command-line arguments to create_work; the input template has precedence. If not specified, the given defaults will be used.

--command_line "-flags foo"

--rsc_fpops_est x

FLOPs estimate; default 3600e9

--rsc_fpops_bound x

FPOPs bound; default 86400e9

--rsc_memory_bound x

default 5e8

--rsc_disk_bound x

default 1e9

--rsc_bandwidth_bound x

0 (no bound)

--credit X

Set the pre-assigned credit for this job.

--delay_bound x

default 1 week

--hr_class N

homogeneous redundancy class

--min_quorum x

default 2

--target_nresults x

default 2

--max_error_results x

default 3

--max_total_results x

default 10

--max_success_results x

default 6

--opaque N

Remote input files

By default, input files are staged locally on the project server, and are identified by their filename.

However, you can also use input files that are remote, i.e. on a web server other than the project server. In that case you must specify them as

--remote_file URL nbytes MD5

where MD5 is the file's MD5 hash. The resulting file will have physical name jf_MD5; i.e. on the client, the project directory will contain the file with the physical name, and the slot directory will contain a link file of the form

<soft_link>../../projects/PROJECT_URL/jf_MD5</soft_link>

Creating multiple jobs

The --stdin option lets you create many jobs with a single invocation of create_work, increasing the efficiency of creating large batches of jobs.

Descriptions of the jobs are read from standard input. Each line specifies a job, and may include the following options:

--command_line "x"

the command line

--wu_name name

the job name

--target_host ID, --target_user ID

assign this job to a host or user.

--wu_template filename

input template file

--result_template filename

output template file

--priority N

job priority

The remaining items specify input files: either physical filenames, or --remote_file arguments as described above.

For example, suppose you have input files named file1 ... filen (already staged), and you want to submit a job for each file. You could create a file file_list containing

file1
file2
...
filen

and then submit the jobs by typing

bin/create_work --appname name --stdin < file_list

From a C++ program

BOINC's library provides a function for submitting jobs:

int create_work(
    DB_WORKUNIT& wu,
    const char* wu_template,                  // contents, not path
    const char* result_template_filename,     // relative to project root
    const char* result_template_filepath,     // absolute or relative to current dir
    const char** infiles,                     // array of input file names
    int ninfiles
    SCHED_CONFIG&,
    const char* command_line = NULL,
    const char* additional_xml = NULL
);

The name and appid fields of the DB_WORKUNIT structure must always be initialized. Other job parameters may be passed either in the DB_WORKUNIT structure or in the input template file (the latter has priority). On a successful return, wu.id contains the database ID of the workunit.

If you want to use remote input files, use the following variant:

int create_work2(
    DB_WORKUNIT& wu,
    const char* wu_template,                  // contents, not path
    const char* result_template_filename,     // relative to project root
    const char* result_template_filepath,     // absolute or relative to current dir
    vector<INFILE_DESC> infiles,              // list of input file descriptions; see below
    SCHED_CONFIG&,
    const char* command_line = NULL,
    const char* additional_xml = NULL
);

struct INFILE_DESC {
    bool is_remote;

    // the following defined if remote (physical name is jf_MD5)
    //
    double nbytes;
    char md5[64];
    char url[1024];         // make this a vector to support multiple URLs

    // the following defined if not remote
    //
    char name[1024];     // physical name
};