CONDORApplication in Condor environment
Master-worker application developed using DC-API can be run in a
Condor environment. The master program must be started by hand and
it submits workunits to a Condor execution pool.
All files that generated by the application including the master and
the worker programs and the DC-API library itself are placed under a
directory called working directory.
Condor environment
To execute a DC-API application using Condor version of the DC-API
library you have to set up a Condor environment and have access to
it.
Master program of the application must be started on a Condor submit
host so it will be able to submit workunits as Condos jobs.
Working directory of the application must be accessible by the
master and the worker processes too so it should be placed on a
shared filesystem (e.g. NFS) which is available for the submit and
the execution hosts in the Condor pool.
Required tools
To compile the application using Condor version of the DC-API
library you need an additional library libcondorapi.a which is included in the
Condor installation. This library must be linked to the application
besides the DC-API library.
Do not use Condor's lib directory
Do not specify Condor's lib directory for the linker when
compiling the application. For example do not use the option:
Linker option
... -L$CONDOR_HOME/lib ...
Instead, copy out the libcondorapi.a file to somewhere else
and use that directory after the linker's -L option.
Configuration optionsInstanceUUID
REQUIRED. Identification of running instance of the
application. For CONDOR backend it can be any string not
just an UUID.
WorkingDirectory
REQUIRED. Name of working directory of the
application. All files that are generated by the
application or the DC-API library are placed under this
directory. Different applications can use the same working
directory because every instance has its own subdirectory
there.
ClientMessageBox
Name of the directory in workunit's working directory
where messages are placed which are sent by the client to
the master by DC_sendMessage(). Default
value is _dcapi_client_messages.
MasterMessageBox
Name of the directory in workunit's working directory
where DC_sendWUMessage()
places messages sent by the master to the client. Default
value is _dcapi_master_messages.
SubresultBox
Name of the directory in workunit's working directory
where DC_sendResult()
places subresults generated by the client. Default
value is _dcapi_client_subresults.
SystemMessageBox
Name of the directory in workunit's working directory
where the master and client program place management
messages for example when the master asks the client to
suspend and it sends back an acknowlegde. Default value is
_dcapi_system_messagesSubmitFile
Name of the file in workunit's working directory which is
generated by the master and used as submit information for
Condor when a workunit is prepared to start. Default value
is _dcapi_condor_submit.txt.
Executable
Name of the executable file of the client (workunit). By
default it is the clientName
parameter which was passed to DC_createWU().
LeaveFiles
Specifies if files, directories generated in workunit's
working directory should be deleted or not after workunit
ends. Zero value means delete and non-zero value means not
to delete. Default value is 0.
CondorLog
Name of the file in workunit's working directory where
Condor writes records about events happen to the Condor
job. Default value is
_dcapi_internal_log.txt.
CheckpointFile
Name of file in workunit's working directory where
checkpoint information is written by the
client. DC_resolveFileName()
will resolve DC_CHECKPOINT_FILE
to this name. Default value is
_dcapi_checkpoint.
SavedOutputs
Name of directory in workunit's working directory where
workunit's standard output is saved when it is
suspended. Deafult values is
_dcapi_saved_output. There is no
facility in the DC-API yet to merge saved output together.
CondorSubmitTemplate
Name of the file which is used as template when generating
Condor submit file. If not specified then a built-in template
will be used. % character can be used to include variable data
into the generated file. Recorgnized % instructions:
%%
Literal %.
%d
Current date and time
%n
Name of the workunit.
%i
Internal ID of the workunit.
%w
Name of working directory of the workunit.
%c
Client name.
%r
Number of the arguments.
%x
Name of the executable.
%a
Argument list.
%u
Condor universe (always vanilla).
%o
File for standard output of the job.
%e
File for standard error of the job.
%l
File for Condor user log.
%I
Comma separated list of input files (physical filenames with path). Capital 'i'.
%O
Comma separated list of output files.
SubmitRetry
If a job cannot be submitted, how many times should DC-API try before giving
up and reporting it as failed. Default value is 5.
SubmitRetrySleepTime
Defines the start value for the sleep period between job submission retries.
Default value is 2. It is multiplied by 2 after each retry, so 2 seconds sleep
before the first retry, 4 seconds before the second, 8 second before the third
and so on.