BOINC client emulator
The BOINC client emulator (BCE) simulates a single BOINC client interacting with one or more projects. BCE uses the same source code as the client for the CPU scheduling and work-fetch policies, so it models the BOINC client accurately.
The intended uses of BCE include:
- Identifying scenarios (combinations of host and project characteristics) where the current scheduling policies don't behave well.
- Studying experimental policies.
However, BCE is not necessarily perfect - in some cases its results may differ significantly from what the actual client would do. Or its inputs may be inadequate to describe a real-life scenario. If you find such cases, please send email to David Anderson.
You can use BCE in either of two ways:
- Through a web interface. This lets you do one simulation at a time, and shows you results graphically.
- Compile it yourself and run from a command line. This provides a more flexible interface.
Input files
The input consists of the following files:
client_state.xml
This describes a set of attached projects. The format is an extension of the state file generated by the client; you can use the state file of a running client as an input to the simulator.
The fields used by the simulator are as follows (fields marked with * are not generated by the client).
host_info
p_ncpus
p_fpops
m_nbytes
coprocs
These describe the hosts's processing hardware. The simulator doesn't model disk usage.
time_stats
on_frac
connected_frac
active_frac
gpu_active_frac
*on_lambda
*connected_lambda
*active_lambda
*gpu_active_lambda
These describe the host's availability:
- on_frac: the fraction of total time this host runs the client
- connected_frac: of the time this host runs the client, the fraction it is connected to the Internet.
- active_frac: of the time this host runs the client, the fraction it is enabled to use CPU
- gpu_active_frac: of the time this host runs the client, the fraction it is enabled to use GPU (always <= active_frac).
For periods of activity and inactivity are exponentially distributed. The mean of the activity periods can be specified with on_lambda etc.; the default is 1 hour.
project
project_name
resource_share
*available
frac
lambda
app
name
*latency_bound
*fpops_est
*fpops_actual
mean
stddev
*weight
*max_concurrent
app_version
app_name
avg_ncpus
flops
plan_class
coproc
type
count
gpu_ram
*working_set
workunit
app_name
rsc_fpops_est
rsc_fpops_bound
result
name
report_deadline
received_time
active_task
result_name
working_set_size
Notes: Each application has a fixed latency bound. It can be specified in app.latency_bound. If not, and there is a result for that app, it is computed as report_deadline - received time for one such result. If there is no result, it is 1 week.
An application has a fixed FLOP count estimate. It can be specified as app.fpops_est. If not, and there is a WU for that app, it is wu.rsc_fpops_est. Otherwise it is 3600*1e9 (i.e., 1 GFLOPS/hr).
An application has a normal distribution of actual FLOP count. It can be specified as app.fpops_actual. Otherwise it is mean app.fpops_est, stddev 0.
An application has an associated weight that determines the fraction of its jobs dispatched by that project. This defaults to 1.
An application version has a fixed working set size. This can be specified as app_version.working_set. If not, and there is an active task for that app version, active_task.working_set_size is used. Otherwise it defaults to 0.
The availability of the projects (i.e. the periods when scheduler RPCs succeed) is modeled with two parameters: the duration of available periods are exponentially distributed with the given mean, and the unavailable periods are exponentially distributed achieving the given available fraction. The availability of a project can be specified as project.available; otherwise it is always available.
The algorithm for simulating a scheduler RPC to project P is:
while need more work
X = list of P's apps with versions for requested resources
if X is empty
break
choose an app A from X, randomly based on weights
V = version that uses requested resources and has highest FLOPS
J = generate job
if J is feasible
update request
else
infeasible_count++
if infeasible_count == 10
break
The available periods (i.e., when BOINC is running) and the idle periods (i.e. when there is no user input) are modeled as above.
global_prefs.xml
format described here.
cc_config.xml
format described here.
Building and running the simulator
The simulator can be built with 'makefile_sim' on Unix or the 'sim' project on Windows. The usage is:
sim [--duration X] [--delta X] [--server_uses_workload] [--dirs d1 ...]
--duration
simulate this much time.
--delta
time step of simulation.
--server_uses_workload
servers take existing workload into account when deciding whether to send jobs.
--dcf_dont_use
Duration correction factor (DCF) is one.
--dcf_stats
Use formula for DCF based on completion time mean/stdev.
--dirs d1 ...
chdir into each of the given directories, and runs a simulation based on the input files there. Prints summaries of each one separately, and a total summary.
Output files
The simulator creates several output files:
index.html: an index of other files
log.txt: This is the message log (same as would be generated by the client). Its contents are controlled by cc_config.xml.
time_line.html: When viewed in a web browser, a 'time line' showing what's running when.
summary.xml: Contains four performance metrics:
wasted_frac
Of the total CPU time, the fraction spent computing results that missed their deadline.
idle_frac
Of the total CPU time, the fraction spent not computing.
share_violation
A measure (0 to 1) of how badly resource shares were violated.
monotony
A measure (0 to 1) of how long a single project used all CPUs (so that user would see only that project on their screensaver, and get bored).
In addition, information is printed about the per-project CPU time and waste.