Commit Graph

54 Commits

Author SHA1 Message Date
David Anderson b7d48765a8 - client: if have coproc jobs but coproc is missing,
skip those jobs in RR sim.
    Otherwise we add stuff to uninitialized data structures,
    and a crash can result.
- client: initialize the above data structures anyway


svn path=/trunk/boinc/; revision=20753
2010-02-28 04:32:10 +00:00
David Anderson f716dcf7ae - client: if a project has zero resource share,
treat it as a "backup project":
    fetch work from it only if there is an idle instance
    and no other projects have work.


svn path=/trunk/boinc/; revision=20286
2010-01-28 05:21:14 +00:00
David Anderson b5124fe729 - client: brute-force attempt at eliminating domino-effect preemption:
if job A is unstarted and EDF,
    and there's a job B that is later in the list,
    is started, has the same app version,
    and has the same arrival time,
    move A after B.
- client: remove the "temp_dcf" mechanism,
    which had the same goal but didn't work.
- client: in computing overall debt for a project,
    subtract a term that reflects pending work.
    This should reduce repeated fetches from the same project.
- client simulator: tweaks

svn path=/trunk/boinc/; revision=20223
2010-01-21 00:14:56 +00:00
David Anderson fe7d8b34f3 - client simulator: done for now
svn path=/trunk/boinc/; revision=20204
2010-01-20 06:35:57 +00:00
David Anderson d6b6f8d5db - client (Mac): append /usr/local/cuda/lib to LD_LIBRARY_PATH
and DYLD_LIBRARY_PATH
- client simulator: compile fixes

svn path=/trunk/boinc/; revision=20117
2010-01-09 16:41:17 +00:00
David Anderson 37aae854f3 - client: scheduling problem:
- a project overestimates job FLOP counts
    - the client starts jobs in EDF mode
    - as job progresses and fraction done increases,
        its completion time estimate decreases until
        it's no longer a deadline miss.
    - job gets preempted by other job from that project;
        you end up with lots of partly completed jobs.
    Solution (I hope): if an app version has running jobs,
        compute a "temp DCF" for the app version,
        which is the min of dynamic/static estimates for its jobs.
        Apply this scaling factor to completion time estimates
        for unstarted jobs in RR simulation
- client: the estimation of remaining time of running jobs was wrong
    (how did this bug survive so long?)

svn path=/trunk/boinc/; revision=20077
2010-01-06 06:01:23 +00:00
David Anderson 876522c6aa - client: add logic to work fetch so that each project
will have enough jobs to use its share of resource instances.
    This avoids situations where e.g. on a 2-CPU system
    a project has 75% resource share and 1 CPU job,
    and its STD increases without bound.
    
    Did a general cleanup of the logic for computing
    work request sizes (seconds and instances).

svn path=/trunk/boinc/; revision=20036
2009-12-24 20:40:27 +00:00
David Anderson e9a4debf9c - client: scheduling tweak.
Old: if a project has RR sim deadline misses,
			select jobs to run high-priority on the basis of:
			1) deadline (earliest first)
			2) estimated time to completion (least first)
			This ignores whether jobs missed their deadline in RR sim,
			so it may choose to run a job that's actually in no
			danger of missing its deadline over one that is.
		New: choose only jobs that miss their deadline in RR sim

svn path=/trunk/boinc/; revision=19826
2009-12-08 20:39:46 +00:00
David Anderson 4d96415576 - client: fix bug introduced in [19035] that causes wrong nidle instances
(and resulting work fetch problems)
- Unix build: don't touch svn_version.sh if it hasn't changed,
    to avoid remake of sched/ (from Gabor Gombas)

svn path=/trunk/boinc/; revision=19096
2009-09-18 19:26:34 +00:00
David Anderson f5a6f862bf - client: fix bug in RR simulation:
start only enough jobs to fill CPUs per project,
    not all the CPU jobs at once.
    I'm not sure how much difference this makes,
    but this is how it's supposed to work.
- client: if app_info.xml doesn't specify flops,
    use an estimate that takes GPUs into account.
- client: if it's been more than 2 weeks since time stats update,
    don't decay on_frac at all.

svn path=/trunk/boinc/; revision=19035
2009-09-09 22:18:02 +00:00
David Anderson c3fe504e1d - client: add ATI support to job scheduling and work fetch
svn path=/trunk/boinc/; revision=18850
2009-08-17 16:50:40 +00:00
David Anderson 0a523d5f3f svn path=/trunk/boinc/; revision=18843 2009-08-14 17:10:52 +00:00
David Anderson e606170b14 - client: try to fix situations where the scheduler
runs GPU jobs in a seemingly random order,
        or preempts GPU jobs needlessly.
        The change has two parts:
        1) sort the "results" vector by received_time,
            so that the RR simulation processes GPU jobs FIFO.
        2) in the CPU scheduler (earliest_deadline_result())
            instead of choosing the earliest-deadline GPU job that
            misses its deadline,
            pick the earliest_deadline GPU from a project that
            has a deadline miss for that GPU type
            (this is what's done in the CPU case)
    - client: fix bug where if you have an exclusive app,
        then remove it from cc_config.xml and do "update config",
        it doesn't go away.
        Need to clear the list before parsing.

svn path=/trunk/boinc/; revision=18842
2009-08-14 16:54:45 +00:00
David Anderson b358089006 svn path=/trunk/boinc/; revision=18632 2009-07-20 17:30:10 +00:00
David Anderson 5753153909 - client: 2nd try on my last checkin.
We need to estimate 2 different delays for each resource type:
    1) "saturated time": the time the resource will be fully utilized
        (new name for the old "estimated delay").
        This is used to compute work requests.
    2) "busy time": the time a new job would have to wait
        to start using this resource.
        This is passed to the scheduler and used for a crude deadline check.
        Note: this is ill-defined; a single number doesn't suffice.
        But as a very rough estimate, I'll use the sum of
            (J.duration * J.ninstances)/ninstances
        over all jobs that miss their deadline under RR sim.

svn path=/trunk/boinc/; revision=18629
2009-07-17 18:29:10 +00:00
David Anderson 8a1c0816ed - client: change the way a resource's "estimated delay"
(passed to server for crude deadline check) is computed.
    Old: estimated delay is the interval for which the resource
        is fully used (i.e., all instances busy).
    Problem: this may cause unnecessary project starvation.
        example: 1 CPU machine, has a month-long CPDN job
        with a 1-year deadline (it's not in deadline trouble).
        Then the CPU estimated delay will be 1 month,
        and the client won't get any work from projects
        with deadlines shorter than 1 month.
    New: estimated delay is the latest time at which the
        resource is fully used and is being used by at least 1 job
        that is projected to miss its deadline under RR.

    Note: this isn't precise, but I don't think we can improve it
    much without getting a lot more complex.


svn path=/trunk/boinc/; revision=18607
2009-07-16 21:21:47 +00:00
David Anderson c2097091fe - client: show "est. delay" correctly in work fetch debug msgs
- client: show times correctly in rr_sim debug msgs
	- client: in "requesting new tasks" msg,
		say what resources we're requesting (if there's more than CPU)
	- client: estimated delay was possibly being calculated incorrectly
		because of roundoff error

svn path=/trunk/boinc/; revision=18269
2009-06-02 22:53:57 +00:00
David Anderson cf638ae3a6 - client: instead of scheduling coproc jobs EDF:
- first schedule jobs projected to miss deadline in EDF order
    - then schedule remaining jobs in FIFO order
    This is intended to reduce the number of preemptions of coproc jobs,
    and hence (since they are always preempted by quit)
    to reduce the wasted time due to checkpoint gaps.
- client: the CPU scheduling policy made use of the number
    of deadline misses in various places.
    This should include only the deadline misses of CPU jobs.
    So move "deadlines_missed" from RR_SIM_STATUS and PROJECT
    to RSC_PROJECT_WORK_FETCH so that we have separate counts
    for CPU and coproc jobs, and use the count for CPU jobs.
- GUI RPC: removed the rr_sim_deadlines_missed field
    from project descriptor.
    This is no longer meaningful, and it didn't seem to be used anywhere.

svn path=/trunk/boinc/; revision=17785
2009-04-10 19:01:38 +00:00
David Anderson 7e256c0995 - client: work fetch: in RR sim, keep track of the number
of device instances used by jobs that miss deadline.
    Don't do "variety" work fetch if this is >= # of instances

svn path=/trunk/boinc/; revision=17631
2009-03-19 16:55:04 +00:00
David Anderson edca22818e - client: in RR simulation, use app_version.flops
instead of host_info.fpops as the FLOPS estimate for non-GPU apps.
    I don't see why this would make any difference
    (these two are equal for non-GPU apps)
    but people have reported that this change improves estimates.

svn path=/trunk/boinc/; revision=17624
2009-03-18 17:24:56 +00:00
David Anderson fb1187e398 svn path=/trunk/boinc/; revision=17501 2009-03-04 22:07:16 +00:00
David Anderson 346ac348b3 - client: RR sim FLOPS estimate for GPU jobs should reflect
fraction of time BOINC is running.


svn path=/trunk/boinc/; revision=17412
2009-02-27 21:44:39 +00:00
David Anderson 125c90d1da - client: work-fetch bug fix: if we're fetching work for a starved
project, it most have no runnable jobs for ANY resource.
- client: work-fetch bug fix: when setting requests in the
    shortfall case, don't request anything if project is backed off
    or overworked for the resource.

svn path=/trunk/boinc/; revision=17338
2009-02-23 21:34:13 +00:00
David Anderson 6a75b78de4 - client: don't ignore jobs with fraction_done=1 (but still running)
in RR simulation; we may need to mark them as deadline miss.
- web: replace & with & various places


svn path=/trunk/boinc/; revision=17278
2009-02-17 17:39:57 +00:00
David Anderson a4a2a68f7d - fix tabs
svn path=/trunk/boinc/; revision=17101
2009-02-02 18:47:34 +00:00
David Anderson 9f170696a4 - client: code cleanup
svn path=/trunk/boinc/; revision=17100
2009-02-02 18:45:00 +00:00
David Anderson 6120b02306 - client: code cleanup
svn path=/trunk/boinc/; revision=17098
2009-02-02 05:15:12 +00:00
David Anderson 89188fca84 - client: there was a problem with how the round simulator
worked in the presence of coprocessors.
    The simulator maintained per-project queues of pending jobs.
    When a job finished (in the simulation) it would get
    one or more jobs from that project's pending queue.

    The problem: this could cause "holes" in the scheduling of GPUs,
    and produce an erroneous nonzero shortfall for GPUs,
    leading to infinite work fetch.

    The solution: maintain a separate (per-resource, not per--project)
    queue of pending coprocessor jobs.
    When a coprocessor job finishes,
    start pending jobs from the queue for that resource.

    Another change: the simulator did strict reservation of coprocessors.
    If there are 2 instances of CUDA,
    and a 1-instance job is running in the simulation,
    it wouldn't start an additional 2-instance job.
    This also can cause erroneous nonzero shortfalls.

    So instead, schedule coprocessors like CPUs, i.e. saturate them.
    This can cause distorted completion time estimates,
    but it's better than infinite work fetch.

svn path=/trunk/boinc/; revision=17093
2009-02-01 04:37:19 +00:00
David Anderson 9e7cb42084 - client: computation of # idle CUDA instances was wrong
svn path=/trunk/boinc/; revision=17087
2009-01-30 21:49:20 +00:00
David Anderson b7a2c227ca - Work fetch / scheduler:
There are two mechanisms to prevent the scheduler from
    sending jobs that won't finish by their deadline.
    Simple mechanism:
        The client sends the interval x for which CPUs are projected
        to be saturated.
        Given a job with estimated duration y,
        the scheduler doesn't send it if x + y exceeds the delay bound.
        If it does send it, x is incremented by y.
    Complex mechanism:
        Client sends workload description.
        Scheduler does EDF simulation, sees if deadlines are missed.
        The only project using this AFAIK is BOINC alpha test.
    Neither of these mechanisms takes coprocessors into account,
    and as a result jobs could be sent that are doomed to
    miss their deadline.
    This checkin adds coprocessor awareness to the Simple mechanism.

    Changes:
    Client:
        compute estimated delay (i.e. time until non-saturation)
        for coprocessors as well as CPU.
        Send them in scheduler request as part of coproc descriptor.
    Scheduler:
        Keep track of estimated delays separately for different resources
- client: fixed bug that computed CPU estimated delay incorrectly
- client: the work request (req_secs) for a resource is the min
    of the project's share and the shortfall.

svn path=/trunk/boinc/; revision=17086
2009-01-30 21:25:24 +00:00
David Anderson f33631cbbc - client: fix messages
svn path=/trunk/boinc/; revision=16960
2009-01-20 18:06:49 +00:00
David Anderson f90dddc9a6 - client: clamp long term debts tp +- 1 week
- client: fix CUDA debt calculation
- client: don't accumulate debt if project->dont_request_more_work
- client: improves messages

svn path=/trunk/boinc/; revision=16909
2009-01-14 23:56:07 +00:00
David Anderson 132cc6bba3 - client: debugging CUDA-related stuff
- client: if reset a project, clear its overall and per-resource backoffs

svn path=/trunk/boinc/; revision=16862
2009-01-10 00:48:22 +00:00
David Anderson 2860574fa5 compile fixes and debug message fixes
svn path=/trunk/boinc/; revision=16836
2009-01-08 00:20:04 +00:00
David Anderson 8740ffdc94 - client: more work-fetch stuff.
No more per-project shortfall.
    It's getting pretty close.

svn path=/trunk/boinc/; revision=16765
2009-01-03 06:01:17 +00:00
David Anderson 72937e5c4f win compile fixes
svn path=/trunk/boinc/; revision=16756
2008-12-31 23:30:38 +00:00
David Anderson 8c591e31df - client: first whack at new work-fetch logic. Very preliminary.
svn path=/trunk/boinc/; revision=16754
2008-12-31 23:07:59 +00:00
David Anderson cd4ca5fb17 - client: fix calculation of a job's FLOPS rate in round-robin simulation
svn path=/trunk/boinc/; revision=16662
2008-12-09 20:01:01 +00:00
David Anderson fbb899f1c0 - client: in round-robin simulation, don't count a project in
total resource share if it has coproc jobs and no CPU jobs.

svn path=/trunk/boinc/; revision=16652
2008-12-08 23:00:23 +00:00
David Anderson af183bc2db - client: in round-robin simulation, remove code that sets CPU shortfall
for projects with no active results.
        This is now wrong because there coproc apps might have pending results.
        Also remove nidle_cpus > 0 conditional that increments CPU shortfall;
        I think this is vestigial code.

svn path=/trunk/boinc/; revision=16646
2008-12-08 18:26:25 +00:00
Charlie Fenton 6d18e79466 client: fix compiler warning.
svn path=/trunk/boinc/; revision=16615
2008-12-04 02:18:01 +00:00
David Anderson ea0146d154 - client: fix calculation of CPU shortfall;
don't fetch work from projects with zero CPU shortfall

svn path=/trunk/boinc/; revision=16613
2008-12-03 23:30:54 +00:00
David Anderson 84f1193a9d - client: use FLOPs, rather than CPU time,
as the basis for estimating job completion times.
    This should improve estimates for GPU apps,
    and prevent the DCF from getting messed up.

svn path=/trunk/boinc/; revision=16598
2008-12-02 03:58:32 +00:00
David Anderson e3fd56f5e8 - client: work-fetch tweak: don't increment overall CPU shortfall
if any jobs pending in simulation

svn path=/trunk/boinc/; revision=16595
2008-12-01 22:06:24 +00:00
David Anderson 57639bdaae - client: in round-robin simulation, only increment CPU shortfall
(per-project or overall) if there are no pending tasks.
        This is needed when there are coproc (i.e. CUDA) jobs;
        CPUs may be idle because pending jobs are waiting for active jobs
        to release coprocs.
        In this situation the CPU idleness should not be counted as shortfall;
        otherwise (if there are only coproc jobs) there will always be a shortfall,
        and the client will fetch infinite work.

svn path=/trunk/boinc/; revision=16545
2008-11-24 18:57:04 +00:00
David Anderson 98d6931d63 - client (Unix): if app uses < 1 CPU, run at nice 10 (not 0)
- client: suppress specious error message

svn path=/trunk/boinc/; revision=16496
2008-11-14 22:08:50 +00:00
Charlie Fenton deaeae4eda client: fix compiler warning indicating real error in RR simulation
svn path=/trunk/boinc/; revision=16391
2008-11-03 10:19:25 +00:00
David Anderson 719921bfaf - client: fix the updating of CPU time left in RR simulation;
don't print msgs about non-CPU-intensive projects.

svn path=/trunk/boinc/; revision=16386
2008-11-01 21:10:08 +00:00
Charlie Fenton 046146317e client: fix compiler warning
svn path=/trunk/boinc/; revision=16383
2008-11-01 00:14:20 +00:00
David Anderson 9987f9d245 - client: revise round-robin simulation to take variable avg_ncpus into account
svn path=/trunk/boinc/; revision=16366
2008-10-30 21:07:35 +00:00