mirror of https://github.com/BOINC/boinc.git
parent
c6ae204235
commit
496d791574
|
@ -279,13 +279,13 @@ public:
|
||||||
// Don't start new results if these exceeds 2.
|
// Don't start new results if these exceeds 2.
|
||||||
|
|
||||||
double work_request;
|
double work_request;
|
||||||
// the unit is "normalized CPU seconds",
|
// the unit is "project-normalized CPU seconds",
|
||||||
// i.e. the work should take 1 CPU on this host
|
// i.e. the work should take 1 CPU on this host
|
||||||
// X seconds of wall-clock time to complete,
|
// X seconds of wall-clock time to complete,
|
||||||
// taking into account
|
// taking into account
|
||||||
// 1) other projects and resource share;
|
// 1) this project's fractional resource share
|
||||||
// 2) on_frac, active_frac, and cpu_effiency
|
// 2) on_frac, active_frac, and cpu_effiency
|
||||||
// see doc/work_req.php
|
// see doc/sched.php
|
||||||
int work_request_urgency;
|
int work_request_urgency;
|
||||||
|
|
||||||
int nresults_returned;
|
int nresults_returned;
|
||||||
|
|
|
@ -113,6 +113,7 @@ language("French", array(
|
||||||
site("http://boinc-quebec.org", "boinc-quebec.org")
|
site("http://boinc-quebec.org", "boinc-quebec.org")
|
||||||
));
|
));
|
||||||
language("German", array(
|
language("German", array(
|
||||||
|
site("http://www.rechenkraft.net/", "Rechenkraft"),
|
||||||
site("http://www.seti-leipzig.de/", "SETI-Leipzig"),
|
site("http://www.seti-leipzig.de/", "SETI-Leipzig"),
|
||||||
site("http://www.dc-gemeinschaft.de/", "DC - Gemeinschaft"),
|
site("http://www.dc-gemeinschaft.de/", "DC - Gemeinschaft"),
|
||||||
site("http://boinccast.podhost.de/", "BOINCcast (Podcast)"),
|
site("http://boinccast.podhost.de/", "BOINCcast (Podcast)"),
|
||||||
|
|
|
@ -14,7 +14,8 @@ where NCPUS is the minimum of the physical number of CPUs
|
||||||
|
|
||||||
<dt><b>CPU scheduling enforcement</b>
|
<dt><b>CPU scheduling enforcement</b>
|
||||||
<dd>
|
<dd>
|
||||||
When to actually enforce (by preemption) the schedule?
|
When to actually enforce the schedule
|
||||||
|
(i.e. by preempting and starting tasks)?
|
||||||
Sometimes it's preferable to delay the preemption of
|
Sometimes it's preferable to delay the preemption of
|
||||||
an application until it checkpoints.
|
an application until it checkpoints.
|
||||||
|
|
||||||
|
@ -26,9 +27,8 @@ and how much work should it ask for?
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The goals of the CPU scheduler and work-fetch policies are
|
The goals of these policies are (in descending priority):
|
||||||
(in descending priority):
|
<ol>
|
||||||
<ul>
|
|
||||||
<li> Results should be completed and reported by their deadline
|
<li> Results should be completed and reported by their deadline
|
||||||
(because results reported after their deadline
|
(because results reported after their deadline
|
||||||
may not have any value to the project and may not be granted credit).
|
may not have any value to the project and may not be granted credit).
|
||||||
|
@ -39,7 +39,7 @@ min_queue days (min_queue is a user preference).
|
||||||
<li> Project resource shares should be honored over the long term.
|
<li> Project resource shares should be honored over the long term.
|
||||||
<li> Variety: if a computer is attached to multiple projects,
|
<li> Variety: if a computer is attached to multiple projects,
|
||||||
execution should rotate among projects on a frequent basis.
|
execution should rotate among projects on a frequent basis.
|
||||||
</ul>
|
</ol>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In previous versions of BOINC,
|
In previous versions of BOINC,
|
||||||
|
@ -66,22 +66,56 @@ at the expense of variety.
|
||||||
<h2>Concepts and terms</h2>
|
<h2>Concepts and terms</h2>
|
||||||
|
|
||||||
<h3>Wall CPU time</h3>
|
<h3>Wall CPU time</h3>
|
||||||
A result's <b>wall CPU time</b> is the amount of wall-clock time
|
<p>
|
||||||
its process has been runnable at the OS level.
|
<b>Wall CPU time</b> is the amount of wall-clock time
|
||||||
|
a process has been runnable at the OS level.
|
||||||
The actual CPU time may be less than this,
|
The actual CPU time may be less than this,
|
||||||
e.g. if the process does a lot of paging,
|
e.g. if the process does a lot of paging,
|
||||||
or if other (non-BOINC) processing jobs run at the same time.
|
or if other (non-BOINC) processing jobs run at the same time.
|
||||||
<p>
|
<p>
|
||||||
BOINC uses wall CPU time as the measure of how much resource
|
BOINC uses wall CPU time as the measure of CPU resource usage.
|
||||||
has been given to each project.
|
Wall CPU time is more fair than actual CPU time in the case of paging apps.
|
||||||
Why not use actual CPU time instead?
|
In addition, the measurement of actual CPU time depends on apps to
|
||||||
<ul>
|
report it correctly, and they may not do this.
|
||||||
<li> Wall CPU time is more fair in the case of paging apps.
|
|
||||||
<li> The measurement of actual CPU time depends on apps to
|
|
||||||
report it correctly.
|
|
||||||
Sometimes apps have bugs that cause them to always report zero.
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
|
<h3>Normalized CPU time</h3>
|
||||||
|
<p>
|
||||||
|
The <b>normalized CPU time</b> of a result is an estimate
|
||||||
|
of the wall time it will take to complete, taking into account
|
||||||
|
<ul>
|
||||||
|
<li> the fraction of time BOINC runs ('on-fraction')
|
||||||
|
<li> the fraction of time computation is enabled ('active-fraction')
|
||||||
|
<li> CPU efficiency (the ratio of actual CPU to wall CPU)
|
||||||
|
</ul>
|
||||||
|
but not taking into account the project's resource share.
|
||||||
|
|
||||||
|
<h3>Project-normalized CPU time</h3>
|
||||||
|
<p>
|
||||||
|
The <b>project-normalized CPU time</b> of a result is an estimate
|
||||||
|
of the wall time it will take to complete, taking into account
|
||||||
|
the above factors plus the project's resource share
|
||||||
|
relative to other potentially runnable projects.
|
||||||
|
<p>
|
||||||
|
The 'work_req' element of a scheduler RPC request
|
||||||
|
is in units of project-normalized CPU time.
|
||||||
|
In deciding how much work to send,
|
||||||
|
the scheduler must take into account
|
||||||
|
the project's resource share fraction,
|
||||||
|
and the host's on-fraction and active-fraction.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
For example, suppose a host has 1 GFLOP/sec CPUs,
|
||||||
|
the project's resource share fraction is 0.5,
|
||||||
|
the host's on-fraction is 0.8
|
||||||
|
and the host's active-fraction is 0.9.
|
||||||
|
Then the expected processing rate per CPU is
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
(1 GFLOP/sec)*0.5*0.8*0.9 = 0.36 GFLOP/sec
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
If the host requests 1000 project-normalized CPU seconds of work,
|
||||||
|
the scheduler should send it at least 360 GFLOPs of work.
|
||||||
|
|
||||||
<h3>Result states</h3>
|
<h3>Result states</h3>
|
||||||
R is <b>runnable</b> if
|
R is <b>runnable</b> if
|
||||||
|
@ -176,19 +210,18 @@ while honoring resource shares over the long term.
|
||||||
<p>
|
<p>
|
||||||
The scheduler starts by doing a simulation of weighted round-robin scheduling
|
The scheduler starts by doing a simulation of weighted round-robin scheduling
|
||||||
applied to the current work queue.
|
applied to the current work queue.
|
||||||
This produces the following outputs:
|
The simulation takes into account on-fraction and active-fraction.
|
||||||
|
It produces the following outputs:
|
||||||
<ul>
|
<ul>
|
||||||
<li> deadline_missed(R): whether result R misses its deadline.
|
<li> deadline_missed(R): whether result R misses its deadline.
|
||||||
<li> deadlines_missed(P):
|
<li> deadlines_missed(P):
|
||||||
the number of results R of P for which deadline_missed(R).
|
the number of results R of P for which deadline_missed(R).
|
||||||
<li> total_shortfall:
|
<li> total_shortfall:
|
||||||
the additional wall CPU time needed to keep all CPUs busy
|
the additional normalized CPU time needed to keep all CPUs busy
|
||||||
for the next min_queue seconds
|
for the next min_queue seconds.
|
||||||
(this is used by the work-fetch policy, see below).
|
|
||||||
<li> shortfall(P):
|
<li> shortfall(P):
|
||||||
the additional wall CPU time needed for project P
|
the additional normalized CPU time needed for project P
|
||||||
to keep it from running out of work in the next min_queue seconds
|
to keep it from running out of work in the next min_queue seconds.
|
||||||
(this is used by the work-fetch policy, see below).
|
|
||||||
</ul>
|
</ul>
|
||||||
<p>
|
<p>
|
||||||
In the example below, projects A and B have resource shares
|
In the example below, projects A and B have resource shares
|
||||||
|
@ -201,8 +234,7 @@ From time 4 to 8, project A gets only a 0.5 share
|
||||||
because it has only one result.
|
because it has only one result.
|
||||||
At time 8, result A1 finishes.
|
At time 8, result A1 finishes.
|
||||||
<p>
|
<p>
|
||||||
In this case, shortfall(A) is 4,
|
In this case, shortfall(A) is 4, shortfall(B) is 0, and total_shortfall is 2.
|
||||||
and total_shortfall is 2.
|
|
||||||
|
|
||||||
<br>
|
<br>
|
||||||
<img src=rr_sim.png>
|
<img src=rr_sim.png>
|
||||||
|
@ -281,7 +313,7 @@ P's fractional resource share among fetchable projects.
|
||||||
<p>
|
<p>
|
||||||
The work-fetch policy function is called every few minutes
|
The work-fetch policy function is called every few minutes
|
||||||
(or as needed) by the scheduler RPC polling function.
|
(or as needed) by the scheduler RPC polling function.
|
||||||
It sets the variable <b>work_request_size(P)</b> for each project P,
|
It sets the variable <b>P.work_request_size</b> for each project P,
|
||||||
which is the number of seconds of work to request
|
which is the number of seconds of work to request
|
||||||
if we do a scheduler RPC to P.
|
if we do a scheduler RPC to P.
|
||||||
This is computed as follows:
|
This is computed as follows:
|
||||||
|
@ -320,7 +352,11 @@ Otherwise, the RPC mechanism chooses the project P for which
|
||||||
P.work_request_size>0 and
|
P.work_request_size>0 and
|
||||||
P.long_term_debt + shortfall(P) is greatest
|
P.long_term_debt + shortfall(P) is greatest
|
||||||
</pre>
|
</pre>
|
||||||
and gets work from that project.
|
and requests work from that project.
|
||||||
|
Note: P.work_request_size is in units of normalized CPU time,
|
||||||
|
so the actual work request is P.work_request_size
|
||||||
|
divided by P's resource share fraction relative to
|
||||||
|
potentially runnable projects.
|
||||||
<hr>
|
<hr>
|
||||||
<h2>Scheduler work-send policy</h2>
|
<h2>Scheduler work-send policy</h2>
|
||||||
<p>
|
<p>
|
||||||
|
|
|
@ -1,25 +0,0 @@
|
||||||
The work_req element of a scheduler RPC request
|
|
||||||
is in units of 'normalized CPU seconds'.
|
|
||||||
|
|
||||||
A request of X normalized CPU seconds
|
|
||||||
is asking for enough work to keep one CPU
|
|
||||||
busy for X seconds of wall-clock time.
|
|
||||||
|
|
||||||
In deciding how much work this is,
|
|
||||||
the scheduler must take into account:
|
|
||||||
|
|
||||||
1) the project's resource share fraction;
|
|
||||||
2) the host's on-fraction
|
|
||||||
3) the host's active-fraction.
|
|
||||||
|
|
||||||
For example, suppose a host has a 1 GFLOP/sec CPU,
|
|
||||||
the project's resource share fraction is 0.5,
|
|
||||||
the host's on-fraction is 0.8
|
|
||||||
and the host's active-fraction is 0.9.
|
|
||||||
|
|
||||||
Then the expected processing rate per CPU is
|
|
||||||
|
|
||||||
1*0.5*0.8*0.9 GFLOP/sec = 0.36 GFLOP/sec
|
|
||||||
|
|
||||||
Suppose the host requests 1000 seconds of work.
|
|
||||||
Then the scheduler should send it at least 360 GFLOPs of work.
|
|
|
@ -23,6 +23,7 @@
|
||||||
#ifdef _WIN32
|
#ifdef _WIN32
|
||||||
#include "boinc_win.h"
|
#include "boinc_win.h"
|
||||||
#else
|
#else
|
||||||
|
#include "config.h"
|
||||||
#include <cstdio>
|
#include <cstdio>
|
||||||
#include <cstdlib>
|
#include <cstdlib>
|
||||||
#include <string>
|
#include <string>
|
||||||
|
|
Loading…
Reference in New Issue