by simulating time-slicing explicitly.
Also simulate changes in project REC
and hence in scheduling priority.
- client: add a log flag "rrsim_detail" that prints
time-slice-level info.
svn path=/trunk/boinc/; revision=24161
- client: cc_config.xml: if <devnum> is omitted from a <exclude_gpu>,
it means exclude all instances of that GPU type
- client: if all instances of a GPU type are excluded for a project,
don't ask the project for jobs of that type
svn path=/trunk/boinc/; revision=23898
adjust project REC by the amount of work queued, to increase variety
NOTE: at some point I think I had a reason to not do this,
but I can't remember what it is.
- client, job scheduling policy: fix how project REC is adjusted
svn path=/trunk/boinc/; revision=23838
as non-high-priority
- client: don't print spurious "domino prevention"
and "thrashing prevention" msgs
- manager: show project descriptions in same size font
as the rest of the dialog
svn path=/trunk/boinc/; revision=23831
- new GPU types can be added easily
- users can specify GPUs in cc_config.xml,
referred to by app_info.xml,
and they will be scheduled by BOINC
and passed --device N options
Note: the parsing of cc_config.xml is not done yet.
- RPC protocols (account manager and scheduler)
can now specify GPU types in separate elements
rather than embedding them in tag names
e.g. <no_rsc>NVIDIA</no_rsc> rather than <no_cuda/>
- client: in account manager replies, parse elements of the form
<no_rsc>NAME</no_rsc>
indicating the GPUs of type NAME should not be used.
This allows account managers to control GPU types
not hardwired into the client.
Note: <no_cuda/> and <no_ati/> will continue to be supported.
- scheduler RPC reply: add
<no_rsc_apps>NAME</no_rsc_apps>
(NAME = GPU name)
to indicate that the project has no jobs for the indicated GPU type.
<no_cuda_apps> etc. are still supported
- client/lib: remove set_debts() GUI RPC
- client/scheduler RPC
remove <cuda_backoff> etc. (superceded by no_app)
Exception: <ip_result> elements in sched request
still have <ncudas> and <natis>.
Fix this later.
Implementation notes:
- client/lib: change "CUDA" to "NVIDIA" in type/variable names, and in XML
Continue to recognize "CUDA" for compatibility
- host_info.coprocs no longer used within the client;
use a global var (COPROCS coprocs) instead.
COPROCS now has an array of COPROCs;
GPUs types are identified by the array index.
Index zero means CPU.
- a bunch of other resource-specific structs (like RSC_WORK_FETCH)
are now stored in arrays, with same indices as COPROCS
(i.e. index 0 is CPU)
- COPROCS still has COPROC_NVIDIA and COPROC_ATI structs to hold vendor-specific info
- APP_VERSION now has a struct GPU_USAGE to describe its GPU usage
svn path=/trunk/boinc/; revision=23253
for RR sim's pending-job lists.
Erasing head of vector is slow.
- lib: allow GPU peak FLOPS to be specified in XML (for simulator)
- simulator work
- client: old work fetch policy: projects may need enough jobs
for all device instances, not just resource_share*ninst.
E.g. a project that has only CPU jobs in a CPU/GPU client
- client: with REC scheduling, don't ask for work for
secondary resources if project has negative priority.
- client: in RR sim, make sure we saturate devices if possible.
Otherwise we may report a shortfall incorrectly
svn path=/trunk/boinc/; revision=22894
since we now do round-robin for GPUs as well as CPU.
NOTE: this bug was found using the client simulator!
- client simulator: generate REC graph
svn path=/trunk/boinc/; revision=22746
recent estimated credit (REC) instead of debt.
These changes are enabled by
#define USE_REC
in work_fetch.h.
If this is commented out (the default) the client uses
debt-based scheduling, same as before.
TODO: work-fetch policy changes
- client simulator: various fixes:
- compute idle and wasted fraction based on all processing resources,
not just CPU
- compute job completion times based on FLOPS, not CPU seconds
- compute and use project->no_X_apps
etc.
svn path=/trunk/boinc/; revision=22741
- scheduler: improve the deadline check mechanism slightly.
When updating "estimated delay" (a rough measure of how long
a resource is saturated with high-priority work)
take into account the # of instances used by the job,
and the # of total instances
svn path=/trunk/boinc/; revision=22612
cmdline arg.
Suppresses the fetch of project list and of current client version #.
Use when running on grid nodes.
- debugging on client simulator. Not done yet.
svn path=/trunk/boinc/; revision=22414
Old: when a job finished, we cleared the backoffs for the
resources it used. The idea was to get more jobs
immediately in the case where the client was at
a jobs-in-progress limit.
Problem: this resulted in an RPC immediately,
typically before the output files were uploaded.
So the client is still at the limit, and doesn't get jobs.
New: clear the backoffs at the point when output files
have been uploaded and the job is ready to report.
- client: change range in resource backoff from (0,x) to (.5, 1.5*x)
svn path=/trunk/boinc/; revision=22411
if job A is unstarted and EDF,
and there's a job B that is later in the list,
is started, has the same app version,
and has the same arrival time,
move A after B.
- client: remove the "temp_dcf" mechanism,
which had the same goal but didn't work.
- client: in computing overall debt for a project,
subtract a term that reflects pending work.
This should reduce repeated fetches from the same project.
- client simulator: tweaks
svn path=/trunk/boinc/; revision=20223
will have enough jobs to use its share of resource instances.
This avoids situations where e.g. on a 2-CPU system
a project has 75% resource share and 1 CPU job,
and its STD increases without bound.
Did a general cleanup of the logic for computing
work request sizes (seconds and instances).
svn path=/trunk/boinc/; revision=20036
It computed an "overall STD" as the sum of CPU and coprocs,
weighted by the coproc's speed, as we do for LTD.
This was the wrong idea; in the presence of GPUs,
STDs quickly get pushed to +- 1 day and are truncated there.
New scheme: STD is maintained per (resource type, project).
This fixes the above problem,
and it opens to door to round-robin scheduling of GPUs.
- client: the calculation of "anticipated debt" was scaling
by relative resource share.
This wasn't correct, seems to me.
- client: rename "debt" to "long_term_debt" in a few places
(but not in the client state file, for compatibility)
svn path=/trunk/boinc/; revision=19777
Old: it's based entirely on CPU time.
So a GPU project, whose app uses only a fraction
of a CPU, accrues positive debt.
This is OK if the project has only GPU apps,
since STD is not (currently) used for GPU scheduling.
But some projects have both CPU and GPU apps.
New: STD is based on total processing.
It has terms for each resource type.
The notion of "runnable resource share" is specific to a type.
Note: the notion of "resource share fraction" appears in
a couple of other places:
- it's passed to apps in app_init_data.xml
- it's passed in scheduler requests.
It should be broken down by resource type in these cases too.
Note to self: do this later.
svn path=/trunk/boinc/; revision=19762
(estimated throughput of all GPUs)/(estimated throughput of all CPUs)
rather than the ratio of 1 GPU to 1 CPU.
This change will hopefully cause ratios of granted credit
to more closely match resource shares.
svn path=/trunk/boinc/; revision=19311
to accept CPU, NVIDIA and ATI jobs.
These prefs are shown only where relevant:
e.g., only for processor types for which the project has app versions,
and if it has versions for only one type, no pref is shown.
These prefs affect both client and scheduler.
The client won't ask for work for a device blocked by prefs,
and the scheduler won't send it.
This replaces earlier optional project-specific prefs for
"no CPU jobs" and "no GPU jobs".
(However, these prefs continue to be honored on the server side).
- client: if NVIDIA driver is unknown, say that rather than 0
svn path=/trunk/boinc/; revision=19194
and <ati_backoff> elements to scheduler reply.
These specify backoffs for the resource types,
overriding the existing backoff mechanism.
Projects can supply these if they don't have apps of a particular type
and don't want to get periodic requests for them.
svn path=/trunk/boinc/; revision=19059
If you have 2 CPUs and a 1-day job in EDF mode,
the busy time should be zero, not .5 days.
Add a class BUSY_TIME_ESTIMATOR that makes a somewhat better
(though still fairly crude) estimate.
svn path=/trunk/boinc/; revision=19003
with a GPU request if project is anonymous platform
AND it has an app for that GPU type
- client: report overall work request as well as per-resource-type requests
svn path=/trunk/boinc/; revision=18994
to the max of the requests for different resource types.
Otherwise projects with old schedulers won't send us work.
svn path=/trunk/boinc/; revision=18945
runs GPU jobs in a seemingly random order,
or preempts GPU jobs needlessly.
The change has two parts:
1) sort the "results" vector by received_time,
so that the RR simulation processes GPU jobs FIFO.
2) in the CPU scheduler (earliest_deadline_result())
instead of choosing the earliest-deadline GPU job that
misses its deadline,
pick the earliest_deadline GPU from a project that
has a deadline miss for that GPU type
(this is what's done in the CPU case)
- client: fix bug where if you have an exclusive app,
then remove it from cc_config.xml and do "update config",
it doesn't go away.
Need to clear the list before parsing.
svn path=/trunk/boinc/; revision=18842
We need to estimate 2 different delays for each resource type:
1) "saturated time": the time the resource will be fully utilized
(new name for the old "estimated delay").
This is used to compute work requests.
2) "busy time": the time a new job would have to wait
to start using this resource.
This is passed to the scheduler and used for a crude deadline check.
Note: this is ill-defined; a single number doesn't suffice.
But as a very rough estimate, I'll use the sum of
(J.duration * J.ninstances)/ninstances
over all jobs that miss their deadline under RR sim.
svn path=/trunk/boinc/; revision=18629
(passed to server for crude deadline check) is computed.
Old: estimated delay is the interval for which the resource
is fully used (i.e., all instances busy).
Problem: this may cause unnecessary project starvation.
example: 1 CPU machine, has a month-long CPDN job
with a 1-year deadline (it's not in deadline trouble).
Then the CPU estimated delay will be 1 month,
and the client won't get any work from projects
with deadlines shorter than 1 month.
New: estimated delay is the latest time at which the
resource is fully used and is being used by at least 1 job
that is projected to miss its deadline under RR.
Note: this isn't precise, but I don't think we can improve it
much without getting a lot more complex.
svn path=/trunk/boinc/; revision=18607
- first schedule jobs projected to miss deadline in EDF order
- then schedule remaining jobs in FIFO order
This is intended to reduce the number of preemptions of coproc jobs,
and hence (since they are always preempted by quit)
to reduce the wasted time due to checkpoint gaps.
- client: the CPU scheduling policy made use of the number
of deadline misses in various places.
This should include only the deadline misses of CPU jobs.
So move "deadlines_missed" from RR_SIM_STATUS and PROJECT
to RSC_PROJECT_WORK_FETCH so that we have separate counts
for CPU and coproc jobs, and use the count for CPU jobs.
- GUI RPC: removed the rr_sim_deadlines_missed field
from project descriptor.
This is no longer meaningful, and it didn't seem to be used anywhere.
svn path=/trunk/boinc/; revision=17785
i.e., list both win/x86 and win/x86 + NVIDIA.
This will allow the manager to show which projects can
use the hosts's coprocessors,
and also graying out projects that require an absent coproc.
- fix compile warnings
svn path=/trunk/boinc/; revision=17735
if resource is saturated for < work_buf_min()
(rather than saturated for 0).
So now the only significance of "overworked" is that we
won't ask overworked projects for work if resource is saturated
more than work_buf_min() but less than work_buf_total()
svn path=/trunk/boinc/; revision=17620
project, it most have no runnable jobs for ANY resource.
- client: work-fetch bug fix: when setting requests in the
shortfall case, don't request anything if project is backed off
or overworked for the resource.
svn path=/trunk/boinc/; revision=17338
There are situations where multiple projects can legitimately
have large negative LTD on a uniprocessor.
Instead...
- client: add <zero_debts> option to cc_config.xml
svn path=/trunk/boinc/; revision=17328