svn path=/trunk/boinc/; revision=9997
This commit is contained in:
David Anderson 2006-04-21 00:02:04 +00:00
parent c6ae204235
commit 496d791574
5 changed files with 68 additions and 55 deletions

View File

@ -279,13 +279,13 @@ public:
// Don't start new results if these exceeds 2.
double work_request;
// the unit is "normalized CPU seconds",
// the unit is "project-normalized CPU seconds",
// i.e. the work should take 1 CPU on this host
// X seconds of wall-clock time to complete,
// taking into account
// 1) other projects and resource share;
// 1) this project's fractional resource share
// 2) on_frac, active_frac, and cpu_effiency
// see doc/work_req.php
// see doc/sched.php
int work_request_urgency;
int nresults_returned;

View File

@ -113,6 +113,7 @@ language("French", array(
site("http://boinc-quebec.org", "boinc-quebec.org")
));
language("German", array(
site("http://www.rechenkraft.net/", "Rechenkraft"),
site("http://www.seti-leipzig.de/", "SETI-Leipzig"),
site("http://www.dc-gemeinschaft.de/", "DC - Gemeinschaft"),
site("http://boinccast.podhost.de/", "BOINCcast (Podcast)"),

View File

@ -14,7 +14,8 @@ where NCPUS is the minimum of the physical number of CPUs
<dt><b>CPU scheduling enforcement</b>
<dd>
When to actually enforce (by preemption) the schedule?
When to actually enforce the schedule
(i.e. by preempting and starting tasks)?
Sometimes it's preferable to delay the preemption of
an application until it checkpoints.
@ -26,9 +27,8 @@ and how much work should it ask for?
</dl>
<p>
The goals of the CPU scheduler and work-fetch policies are
(in descending priority):
<ul>
The goals of these policies are (in descending priority):
<ol>
<li> Results should be completed and reported by their deadline
(because results reported after their deadline
may not have any value to the project and may not be granted credit).
@ -39,7 +39,7 @@ min_queue days (min_queue is a user preference).
<li> Project resource shares should be honored over the long term.
<li> Variety: if a computer is attached to multiple projects,
execution should rotate among projects on a frequent basis.
</ul>
</ol>
<p>
In previous versions of BOINC,
@ -66,22 +66,56 @@ at the expense of variety.
<h2>Concepts and terms</h2>
<h3>Wall CPU time</h3>
A result's <b>wall CPU time</b> is the amount of wall-clock time
its process has been runnable at the OS level.
<p>
<b>Wall CPU time</b> is the amount of wall-clock time
a process has been runnable at the OS level.
The actual CPU time may be less than this,
e.g. if the process does a lot of paging,
or if other (non-BOINC) processing jobs run at the same time.
<p>
BOINC uses wall CPU time as the measure of how much resource
has been given to each project.
Why not use actual CPU time instead?
<ul>
<li> Wall CPU time is more fair in the case of paging apps.
<li> The measurement of actual CPU time depends on apps to
report it correctly.
Sometimes apps have bugs that cause them to always report zero.
</ul>
BOINC uses wall CPU time as the measure of CPU resource usage.
Wall CPU time is more fair than actual CPU time in the case of paging apps.
In addition, the measurement of actual CPU time depends on apps to
report it correctly, and they may not do this.
<h3>Normalized CPU time</h3>
<p>
The <b>normalized CPU time</b> of a result is an estimate
of the wall time it will take to complete, taking into account
<ul>
<li> the fraction of time BOINC runs ('on-fraction')
<li> the fraction of time computation is enabled ('active-fraction')
<li> CPU efficiency (the ratio of actual CPU to wall CPU)
</ul>
but not taking into account the project's resource share.
<h3>Project-normalized CPU time</h3>
<p>
The <b>project-normalized CPU time</b> of a result is an estimate
of the wall time it will take to complete, taking into account
the above factors plus the project's resource share
relative to other potentially runnable projects.
<p>
The 'work_req' element of a scheduler RPC request
is in units of project-normalized CPU time.
In deciding how much work to send,
the scheduler must take into account
the project's resource share fraction,
and the host's on-fraction and active-fraction.
<p>
For example, suppose a host has 1 GFLOP/sec CPUs,
the project's resource share fraction is 0.5,
the host's on-fraction is 0.8
and the host's active-fraction is 0.9.
Then the expected processing rate per CPU is
<pre>
(1 GFLOP/sec)*0.5*0.8*0.9 = 0.36 GFLOP/sec
</pre>
If the host requests 1000 project-normalized CPU seconds of work,
the scheduler should send it at least 360 GFLOPs of work.
<h3>Result states</h3>
R is <b>runnable</b> if
@ -176,19 +210,18 @@ while honoring resource shares over the long term.
<p>
The scheduler starts by doing a simulation of weighted round-robin scheduling
applied to the current work queue.
This produces the following outputs:
The simulation takes into account on-fraction and active-fraction.
It produces the following outputs:
<ul>
<li> deadline_missed(R): whether result R misses its deadline.
<li> deadlines_missed(P):
the number of results R of P for which deadline_missed(R).
<li> total_shortfall:
the additional wall CPU time needed to keep all CPUs busy
for the next min_queue seconds
(this is used by the work-fetch policy, see below).
the additional normalized CPU time needed to keep all CPUs busy
for the next min_queue seconds.
<li> shortfall(P):
the additional wall CPU time needed for project P
to keep it from running out of work in the next min_queue seconds
(this is used by the work-fetch policy, see below).
the additional normalized CPU time needed for project P
to keep it from running out of work in the next min_queue seconds.
</ul>
<p>
In the example below, projects A and B have resource shares
@ -201,8 +234,7 @@ From time 4 to 8, project A gets only a 0.5 share
because it has only one result.
At time 8, result A1 finishes.
<p>
In this case, shortfall(A) is 4,
and total_shortfall is 2.
In this case, shortfall(A) is 4, shortfall(B) is 0, and total_shortfall is 2.
<br>
<img src=rr_sim.png>
@ -281,7 +313,7 @@ P's fractional resource share among fetchable projects.
<p>
The work-fetch policy function is called every few minutes
(or as needed) by the scheduler RPC polling function.
It sets the variable <b>work_request_size(P)</b> for each project P,
It sets the variable <b>P.work_request_size</b> for each project P,
which is the number of seconds of work to request
if we do a scheduler RPC to P.
This is computed as follows:
@ -320,7 +352,11 @@ Otherwise, the RPC mechanism chooses the project P for which
P.work_request_size>0 and
P.long_term_debt + shortfall(P) is greatest
</pre>
and gets work from that project.
and requests work from that project.
Note: P.work_request_size is in units of normalized CPU time,
so the actual work request is P.work_request_size
divided by P's resource share fraction relative to
potentially runnable projects.
<hr>
<h2>Scheduler work-send policy</h2>
<p>

View File

@ -1,25 +0,0 @@
The work_req element of a scheduler RPC request
is in units of 'normalized CPU seconds'.
A request of X normalized CPU seconds
is asking for enough work to keep one CPU
busy for X seconds of wall-clock time.
In deciding how much work this is,
the scheduler must take into account:
1) the project's resource share fraction;
2) the host's on-fraction
3) the host's active-fraction.
For example, suppose a host has a 1 GFLOP/sec CPU,
the project's resource share fraction is 0.5,
the host's on-fraction is 0.8
and the host's active-fraction is 0.9.
Then the expected processing rate per CPU is
1*0.5*0.8*0.9 GFLOP/sec = 0.36 GFLOP/sec
Suppose the host requests 1000 seconds of work.
Then the scheduler should send it at least 360 GFLOPs of work.

View File

@ -23,6 +23,7 @@
#ifdef _WIN32
#include "boinc_win.h"
#else
#include "config.h"
#include <cstdio>
#include <cstdlib>
#include <string>