Goals and motivation

The BOINC client result computation scheduling aims to achieve the following goals:

  1. Efficiently and effectively use system resources.
    This is clearly desirable.
  2. Maintain a sense of minimum variety of projects for which results are computed in a given amount of time.
    A user participating in multiple projects can get bored seeing his computer work only on one project for a long time.
  3. Respect the resource share allocation for each project.
    The user specifies the resource shares and thus expects them to be honored.

The motivation for the second goal stems from the potential orders-of-magnitude differences in expected completion time for results from different projects. Some projects will have results that complete in hours, while other projects may have results that take months to complete. A scheduler that runs result computations to completion before starting a new computation will keep projects with short-running result computations stuck behind projects with long-running result computations. A participant in multiple projects will expect to see his computer work on each of these projects in a reasonable time period, not just the project with the long-running result computations.

A project's resource share represents how much computing resources (CPU time, network bandwith, storage space) a user wants to allocate to the project relative to the resources allocated to all of the other projects in which he is participating. The client should respect this allocation to be faithful to the user. In the case of CPU time, the result computation scheduling should achieve the expected time shares over a reasonable time period.

At the same time, the scheduler RPC policy needs to complement the result scheduling. We have the following goals for this policy:

  1. Have enough work to keep all CPUs busy
  2. Have enough work to provide for minimum variety of projects
  3. Respect work_buf_min and work_buf_max

BOINC client result computation scheduling

We address the goals using result preemption. After a given time period, the client decides on a new set of projects for which results will be computed in the next time period. This decision will consider the projects' resource shares by tracking the debt owed to a project. The debt to a project accrues according to the project's resource share, and is paid off when CPU time is devoted to the project.

A consequence of result preemption is that projects can have multiple active result computations at a given time. For example, consider a two processor system participating in two projects, A and B, with resource shares 75% and 25%, respectively. Ideally, one processor will run a result computation for A, while the other processor will switch between running result computations for A and B. Thus, A will have two active result computations. This consequence implies a desirable property of the result preemption scheme: that the number of active result computations for a project be minimized. For example, it's better to have one result from project P complete in time T than to have two results from project P simultaneously complete in time 2T. Maintaining more active result computations than necessary increases the mean-time-to-completion if the client switches between these active result computations.

We will attempt to minimize the number of active result computations for a project by dynamically choosing results to compute from a global pool. When we allocate CPU time to project, we will choose results to compute intelligently: choose already running tasks first, then preempted tasks, and only choose to start a new result computation in the last resort. This will not guarantee the above property, but we hope it will be close to achieving it.

A sketch of the result preemption algorithm

The algorithm requires that a time period length be defined (e.g. one hour). The result preemption algorithm is run at the beginning of each period. It proceeds as follows:

  1. Pay off debts to projects according to the amount of work done for the projects in the last period.
  2. Accrue debts to projects according to the projects' resource shares.
  3. Let the expected future debt for each project be initialized to its actual debt.
  4. Repeat until we decide on a result to compute for each processor:
    1. Choose the project that has the largest expected future debt and a ready-to-compute result.
    2. Decrease the expected future debt of the project by the amount we expect to pay off, and return the project back into consideration for running on another processor.
  5. Preempt the current result computations with the new ones.

Because result computations may finish before the time period expires, we need to account for such a gap in a project's debt payment. So, we need to also keep track of the amount of work done during the current time period for each project as results finish. This accounting should be reset for each time period.

Finally, the starting of new result computations in the middle of a time period needs to use this accounting instead of the expected future debts that were estimated at the beginning of the time period. Otherwise, it will be similar to the decision of choosing which tasks to run at the beginning of a time period.

Pseudocode

We'll initialize total_work_done_this_period to num_cpus * period_length.

preempt_apps(): // called after every period_length

// finish accounting
foreach T in running_tasks:
    T.project.work_done_this_period += T.work_done_this_period
    total_work_done_this_period += T.work_done_this_period

// pay off and accrue debts
foreach P in projects:
    P.debt += P.resource_share * total_work_done_this_period
            - P.work_done_this_period

// make preemption decisions
expected_pay_off = total_work_done_this_period / num_cpus
foreach P in projects:
    P.expected_future_debt = P.debt
to_preempt.addAll(running_tasks) // assume we'll preempt everything at first
to_run = ()
do num_cpus times:
    found = false
    do projects.size times:
        // choose the project with the largest expected future debt
        P = argmax { P.expected_future_debt } over all P in projects
        if (some T in to_preempt is for P):
            // P has a task that ran last period, so just don't preempt it
            to_preempt.remove(T)
            T.expected_pay_off = expected_pay_off
            found = true
            break
        if (some T in preempted_tasks is for P):
            // P has a task that was preempted
            preempted_tasks.remove(T)
            to_run.add(T)
            T.expected_pay_off = expected_pay_off
            found = true
            break
        if (some R in results is for P, not active, and ready to run):
            T = new ACTIVE_TASK for R
            to_run.add(T)
            T.expected_pay_off = expected_pay_off
            found = true
            break
        remove P from consideration in the argmax
    if found:
        P.expected_future_debt -= expected_pay_off
    else:
        break
suspend tasks in to_preempt (reset T.expected_pay_off for each T in to_preempt)
run or unsuspend tasks in to_run (and put in running_tasks)

// reset accounting
foreach P in projects:
    P.work_done_this_period = 0
total_work_done_this_period = 0

----------

start_apps(): // called at each iteration of the BOINC main loop

foreach P in projects:
    // expected_future_debt should account for any tasks that finished
    // and for tasks that are still running
    P.expected_future_debt = P.debt - P.work_done_this_period
foreach T in running_tasks:
    T.project.expected_future_debt -= T.expected_pay_off

to_run = ()
while running_tasks.size < num_cpus:
    do projects.size times:
        // choose the project with the largest expected future debt
        P = argmax { P.expected_future_debt } over all P in projects
        if (some T in preempted_tasks is for P):
            // P has a task that was preempted
            preempted_tasks.remove(T)
            to_run.add(T)
            T.expected_pay_off = fraction_of_period_left * expected_pay_off
            found = true
            break
        if (some R in results is for P, not active, and ready to run):
            T = new ACTIVE_TASK for R
            to_run.add(T)
            T.expected_pay_off = fraction_of_period_left * expected_pay_off
            found = true
            break
        remove P from consideration in the argmax
    if found:
        P.expected_future_debt -= fraction_of_period_left * expected_pay_off
    else:
        break
run or unsuspend tasks in to_run

----------

handle_finished_apps(): // called at each iteration of the BOINC main loop

foreach T in running_tasks:
    if T finished:
        // do some accounting
        T.project.work_done_this_period += T.work_done_this_period
        total_work_done_this_period += T.work_done_this_period
        do other clean up stuff

Debt computation

The notion of debt is used to respect the resource share allocation for each project. The debt to a project represents the amount of work (in CPU time) we owe to a project. Debt is paid off when CPU time is devoted to a project. We accrue the debt to a project according to the total amount of work done in a time period scaled by the project's resource share.

For example, consider a system participating in two projects, A and B, with resource shares 75% and 25%, respectively. Suppose in some time period, the system devotes 25 minutes of CPU time to project A and 15 minutes of CPU time to project B. We decrease the debt to A by 20 minutes and accrue it by 30 minutes (75% of 25 + 15). So the debt increases overall. This makes sense because we expected to devote a larger percentage of the system resources to project A than it actually got.

The choosing of projects for which to start result computations at the beginning of each time period can simply follow the debt ordering of the projects. The algorithm computes the expected future debt to a project (the debt we expect to owe after the time period expires) as it chooses result computations to run.

expected future debt = debt - expected pay off * number of tasks to run this period

However, choosing projects to run in the middle of a time period is a little different. The preemption algorithm expected each of the tasks it started to last for the entire time period. However, when a task finishes in the middle of a time period, the expected future debt to the respective project is an overestimate. We thus change the expected future debt to reflect what has taken place: it is the debt owed to the project at the beginning of the time period, minus the amount of work that has already been done this time period, and minus the amount of work we expect to complete by the end of the time period. When projects have results chosen to run, we decrease the expected future debt by the amount of work we expect to be done for the project in the remainder of the time period.

expected future debt = debt - (work completed + expected pay off of tasks already running this period + expected pay off * fraction of period left * number of new tasks for this period)

Scheduler RPC policy

The client should get more work when either of the following are true:

  1. The client will have no work in at most work_buf_min days.
  2. The client will not have enough work for a project to get its fair share of computation time (according to its resource share)
  3. The client will have fewer than num_cpus tasks

Ignoring the second case can cause long running result computations to monopolize the CPU, even with result preemption. For example, suppose a project has work units that finish on the order of months. Then, when work_buf_min is on the order of days, the client will never think it is out of work. However, projects with shorter result computations may run out of work. So, even with preemption, we cannot have minimum variety.

need_to_get_work()

The second case (running out of work for one project) is addressed by capping the amount of work counted for a project. We cap it by the total amount of work that can be done in min_work_buf_secs, scaled by the project's resource share. Thus, the client will get more work when any one project has too little work.

The case of having fewer results than CPUs is addressed by \"packing\" results into CPU \"bins\".

need_to_get_work():

    num_cpus_busy = 0
    total_work_secs = 0
    work_secs_for_one_cpu = 0
    foreach P in projects:
        P.avail_work_secs = 0

    sort results in order of decreasing estimated_cpu_time

    // pack results into CPU bins
    foreach R in results:
        result_work_secs = estimated_cpu_time(R)
        work_secs_for_one_cpu += result_work_secs
        R.project.avail_work_secs += result_work_secs
        if work_secs_for_one_cpu >= min_work_buf_secs
            work_secs_for_one_cpu = 0
            num_cpus_busy += 1

    // count total amount of work, but cap amount any one project contributes
    // to this total
    foreach P in projects:
        total_work_secs += min { P.avail_work_secs,
                            P.resource_share * min_work_buf_secs * num_cpus }

    return (num_cpus_busy < num_cpus)
        || (total_work_secs < min_work_secs * num_cpus)

XXX it will be useful to know what caused this predicate to return true, so maybe it should be split into separate predicates.

XXX also need to factor in if we are able to currently contact a project (according to min_rpc_time).

"; page_tail(); ?>