The CPU scheduling policy aims to achieve the following goals (in decreasing priority):
A result is 'active' if there is a slot directory for it. There can be more active results than CPUs.
The notion of 'debt' is used to respect the resource share allocation for each project. The debt to a project represents the amount of work (in CPU time) we owe it. Debt is decreased when CPU time is devoted to a project. We increase the debt to a project according to the total amount of work done in a time period scaled by the project's resource share.
For example, consider a system participating in two projects, A and B, with resource shares 75% and 25%, respectively. Suppose in some time period, the system devotes 25 minutes of CPU time to project A and 15 minutes of CPU time to project B. We decrease the debt to A by 25 minutes and increase it by 30 minutes (75% of 25 + 15). So the debt increases overall. This makes sense because we expected to devote a larger percentage of the system resources to project A than it actually got.
The choice of projects for which to start result computations can simply follow the debt ordering of the projects. The algorithm computes the 'anticipated debt' to a project (the debt we expect to owe after the time period expires) as it chooses result computations to run.
This algorithm is run:
We will attempt to minimize the number of active result computations for a project by dynamically choosing results to compute from a global pool. When we allocate CPU time to project, we will choose already running tasks first, then preempted tasks, and only choose to start a new result computation in the last resort. This will not guarantee the above property, but we hope it will be close to achieving it.
data structures: ACTIVE_TASK: double cpu_time_at_last_sched double current_cpu_time scheduler_state: PREEMPTED RUNNING next_scheduler_state // temp PROJECT: double work_done_this_period // temp double debt double anticipated_debt // temp RESULT next_runnable_result schedule_cpus(): foreach project P P.work_done_this_period = 0 total_work_done_this_period = 0 foreach task T that is RUNNING: x = T.current_cpu_time - T.cpu_time_at_last_sched T.project.work_done_this_period += x total_work_done_this_period += x foreach P in projects: P.debt += P.resource_share * total_work_done_this_period - P.work_done_this_period expected_pay_off = total_work_done_this_period / num_cpus foreach P in projects: P.anticipated_debt = P.debt foreach task T T.next_scheduler_state = PREEMPTED do num_cpus times: // choose the project with the largest anticipated debt P = argmax { P.anticipated_debt } over all P in projects with runnable result if none: break if (some T (not already scheduled to run) for P is RUNNING): T.next_scheduler_state = RUNNING P.anticipated_debt -= expected_pay_off continue if (some T (not already scheduled to run) for P is PREEMPTED): T.next_scheduler_state = RUNNING P.anticipated_debt -= expected_pay_off continue if (some R in results is for P, not active, and ready to run): Choose R with the earliest deadline T = new ACTIVE_TASK for R T.next_scheduler_state = RUNNING P.anticipated_debt -= expected_pay_off foreach task T if scheduler_state == PREEMPTED and next_scheduler_state = RUNNING unsuspend or run if scheduler_state == RUNNING and next_scheduler_state = PREEMPTED suspend (or kill) foreach task T T.cpu_time_at_last_sched = T.current_cpu_time
The work fetch policy has the following goal:
At a given time, the CPU scheduler may need as many as
min_results(P) = ceil(ncpus * P.resource_share)
results for project P to avoid starvation. The client can estimate the amount of time that will elapse until the number of runnable results falls below min_results(P) for some project P. When this length of time is less than T, it is time to get more work for project P.
NEED_WORK_IMMEDIATELY CPU scheduler is currently starved (may not have idle CPU) NEED_WORK Will starve within T days DONT_NEED_WORK otherwiseIt can be called whenever the client can make a scheduler RPC.
The mechanism for actually getting work checks if a project has a non-zero work request and if so, makes the scheduler RPC call to request the work.
data structures: PROJECT: double work_request compute_work_request(): urgency = 0 foreach project P: work_remaining = 0 results_to_skip = min_results(P) - 1 P.work_request = 0 foreach result R for P in order of decreasing deadline: if results_to_skip > 0: results_to_skip-- continue work_remaining += E(R) if P.work_remaining < T: if work_remaining == 0: urgency = NEED_WORK_IMMEDIATELY P.work_request = (2*T - work_remaining / SECONDS_PER_DAY) * P.resource_share urgency = max(NEED_WORK, urgency) return urgency"; page_tail(); ?>