CPU scheduling aims to achieve the following goals (decreasing priority):
A result is 'active' if there is a slot directory for it. A consequence of result preemption is that there can be more active results than CPUs.
The notion of 'debt' is used to respect the resource share allocation for each project. The debt to a project represents the amount of work (in CPU time) we owe it. Debt is decreased when CPU time is devoted to a project. We increase the debt to a project according to the total amount of work done in a time period scaled by the project's resource share.
For example, consider a system participating in two projects, A and B, with resource shares 75% and 25%, respectively. Suppose in some time period, the system devotes 25 minutes of CPU time to project A and 15 minutes of CPU time to project B. We decrease the debt to A by 20 minutes and increase it by 30 minutes (75% of 25 + 15). So the debt increases overall. This makes sense because we expected to devote a larger percentage of the system resources to project A than it actually got.
The choice of projects for which to start result computations can simply follow the debt ordering of the projects. The algorithm computes the 'anticipated debt' to a project (the debt we expect to owe after the time period expires) as it chooses result computations to run.
This algorithm is run:
We will attempt to minimize the number of active result computations for a project by dynamically choosing results to compute from a global pool. When we allocate CPU time to project, we will choose already running tasks first, then preempted tasks, and only choose to start a new result computation in the last resort. This will not guarantee the above property, but we hope it will be close to achieving it.
data structures: ACTIVE_TASK: double cpu_at_last_schedule_point double current_cpu_time scheduler_state: PREEMPTED RUNNING next_scheduler_state // temp PROJECT: double work_done_this_period // temp double debt double anticipated_debt // temp bool has_runnable_result schedule_cpus(): foreach project P P.work_done_this_period = 0 total_work_done_this_period = 0 foreach task T that is RUNNING: x = current_cpu_time - T.cpu_at_last_schedule_point T.project.work_done_this_period += x total_work_done_this_period += x foreach P in projects: P.debt += P.resource_share * total_work_done_this_period - P.work_done_this_period expected_pay_off = total_work_done_this_period / num_cpus foreach P in projects: P.anticipated_debt = P.debt foreach task T T.next_scheduler_state = PREEMPTED do num_cpus times: // choose the project with the largest anticipated debt P = argmax { P.anticipated_debt } over all P in projects with runnable result if none: break if (some T in P is RUNNING): t.next_scheduler_state = RUNNING P.anticipated_debt -= expected_pay_off continue if (some T in P is PREEMPTED): T.next_scheduler_state = RUNNING P.anticipated_debt -= expected_pay_off continue if (some R in results is for P, not active, and ready to run): T = new ACTIVE_TASK for R T.next_scheduler_state = RUNNING P.anticipated_debt -= expected_pay_off foreach task T if scheduler_state == PREEMPTED and next_scheduler_state = RUNNING unsuspend or run if scheduler_state == RUNNING and next_scheduler_state = PREEMPTED suspend (or kill) foreach task T T.cpu_at_last_schedule_point = current_cpu_time
The work fetch policy has the following goal:
The CPU scheduler needs a minimum number of results from a project in order to respect the project's resource share. We effectively have too little work when the number of results for a project is less than this minimum number.
min_results(P) = ceil(ncpus * P.resource_share)
The client can estimate the amount of time that will elapse until we have too little work for a project. When this length of time is less than T, it is time to get more work.
This algorithm determines if a project needs more work. If a project does need work, then the amount of work it needs is computed. It is called whenever the client can make a scheduler RPC.
The mechanism for actually getting work checks if a project has a non-zero work request and if so, makes the scheduler RPC call to request the work.
data structures: PROJECT: double work_request_days check_work_needed(Project P): if num_results(P) < min_results(P): P.work_request_days = 2T return NEED_WORK_IMMEDIATELY top_results = top (min_results(P) - 1) results of P by expected completion time work_remaining = 0 foreach result R of P that is not in top_results: work_remaining += R.expected_completion_time work_remaining *= P.resource_share * active_frac / ncpus if work_remaining < T: P.work_request_days = 2T - work_remaining / seconds_per_day return NEED_WORK else: P.work_request_days = 0 return DONT_NEED_WORK"; page_tail(); ?>