Commit Graph

58 Commits

Author SHA1 Message Date
David Anderson c5462e4917 - client: more hysteresis work fetch policy stuff
- client simulator work

svn path=/trunk/boinc/; revision=22858
2010-12-30 22:41:50 +00:00
David Anderson 7aeef3070a - client: enabled REC-based scheduling with a cmdline option
rather than a compile flag

svn path=/trunk/boinc/; revision=22855
2010-12-25 19:05:57 +00:00
David Anderson f3169fb77a - client: initial, partial checkin for hysteresis work-fetch
svn path=/trunk/boinc/; revision=22853
2010-12-23 23:39:30 +00:00
David Anderson a129c0d8cd - client: do exponential backoff (from 10 min to 24 hours)
on account manager RPC failures,
    rather than always waiting 24 hours

svn path=/trunk/boinc/; revision=22747
2010-11-25 04:35:50 +00:00
David Anderson b39615d461 - client: work fetch fix: try to maintain GPU work all projects,
since we now do round-robin for GPUs as well as CPU.
    NOTE: this bug was found using the client simulator!
- client simulator: generate REC graph

svn path=/trunk/boinc/; revision=22746
2010-11-24 20:51:25 +00:00
David Anderson 6478b3e05d - client: implement more scheduler changes that use
recent estimated credit (REC) instead of debt.
    These changes are enabled by
        #define USE_REC
    in work_fetch.h.
    If this is commented out (the default) the client uses
    debt-based scheduling, same as before.
    TODO: work-fetch policy changes
- client simulator: various fixes:
    - compute idle and wasted fraction based on all processing resources,
        not just CPU
    - compute job completion times based on FLOPS, not CPU seconds
    - compute and use project->no_X_apps
    etc.


svn path=/trunk/boinc/; revision=22741
2010-11-23 19:39:47 +00:00
David Anderson ef472e3df7 - client simulator: model the scheduler's deadline check mechanism
- scheduler: improve the deadline check mechanism slightly.
    When updating "estimated delay" (a rough measure of how long
    a resource is saturated with high-priority work)
    take into account the # of instances used by the job,
    and the # of total instances


svn path=/trunk/boinc/; revision=22612
2010-11-01 16:53:41 +00:00
David Anderson 4edfe2ec28 - client: small initial checkin for new scheduling system.
Keep track of per-project recent estimated credit

svn path=/trunk/boinc/; revision=22608
2010-10-29 23:41:34 +00:00
David Anderson 1c4422985f - client: add <no_info_fetch> config option and --no_info_fetch
cmdline arg.
    Suppresses the fetch of project list and of current client version #.
    Use when running on grid nodes.
- debugging on client simulator.  Not done yet.

svn path=/trunk/boinc/; revision=22414
2010-09-27 20:34:47 +00:00
David Anderson 31db3207e4 - client: fix bug that cause wasted scheduler RPC
Old: when a job finished, we cleared the backoffs for the
        resources it used.  The idea was to get more jobs
        immediately in the case where the client was at
        a jobs-in-progress limit.
    Problem: this resulted in an RPC immediately,
        typically before the output files were uploaded.
        So the client is still at the limit, and doesn't get jobs.
    New: clear the backoffs at the point when output files
        have been uploaded and the job is ready to report.
- client: change range in resource backoff from (0,x) to (.5, 1.5*x)


svn path=/trunk/boinc/; revision=22411
2010-09-24 21:24:02 +00:00
David Anderson 6faf48e3a8 - client: fix crashing bug on VC 2008/10;
don't memset(0,) structures containing vectors.

svn path=/trunk/boinc/; revision=21963
2010-07-15 21:43:51 +00:00
David Anderson d78b5fb79a - client: if a project is anonymous platform and it has no
app versions that use a resource,
    don't request work from it for that resource.

svn path=/trunk/boinc/; revision=20549
2010-02-11 22:19:22 +00:00
David Anderson ee889ac9dd svn path=/trunk/boinc/; revision=20284 2010-01-27 19:14:29 +00:00
David Anderson b5124fe729 - client: brute-force attempt at eliminating domino-effect preemption:
if job A is unstarted and EDF,
    and there's a job B that is later in the list,
    is started, has the same app version,
    and has the same arrival time,
    move A after B.
- client: remove the "temp_dcf" mechanism,
    which had the same goal but didn't work.
- client: in computing overall debt for a project,
    subtract a term that reflects pending work.
    This should reduce repeated fetches from the same project.
- client simulator: tweaks

svn path=/trunk/boinc/; revision=20223
2010-01-21 00:14:56 +00:00
David Anderson 876522c6aa - client: add logic to work fetch so that each project
will have enough jobs to use its share of resource instances.
    This avoids situations where e.g. on a 2-CPU system
    a project has 75% resource share and 1 CPU job,
    and its STD increases without bound.
    
    Did a general cleanup of the logic for computing
    work request sizes (seconds and instances).

svn path=/trunk/boinc/; revision=20036
2009-12-24 20:40:27 +00:00
David Anderson 37ea627866 - Win compile fixes. Also, needed to provide a replacement
for strptime() on Win.  WTF?

svn path=/trunk/boinc/; revision=20003
2009-12-21 19:20:28 +00:00
David Anderson 2ef5c5895b - client: fix bug in debt calculation
- client: <zero_debts> zeroes STD too

svn path=/trunk/boinc/; revision=19783
2009-12-04 21:21:18 +00:00
David Anderson 2d4ceb618a - client: my STD-related checkin of Dec 1 was bad.
It computed an "overall STD" as the sum of CPU and coprocs,
    weighted by the coproc's speed, as we do for LTD.
    This was the wrong idea; in the presence of GPUs,
    STDs quickly get pushed to +- 1 day and are truncated there.

    New scheme: STD is maintained per (resource type, project).
    This fixes the above problem,
    and it opens to door to round-robin scheduling of GPUs.
- client: the calculation of "anticipated debt" was scaling
    by relative resource share.
    This wasn't correct, seems to me.
- client: rename "debt" to "long_term_debt" in a few places
    (but not in the client state file, for compatibility)

svn path=/trunk/boinc/; revision=19777
2009-12-03 23:09:25 +00:00
David Anderson 59328aaccb - client: change how short term debt is updated.
Old: it's based entirely on CPU time.
        So a GPU project, whose app uses only a fraction
        of a CPU, accrues positive debt.
        This is OK if the project has only GPU apps,
        since STD is not (currently) used for GPU scheduling.
        But some projects have both CPU and GPU apps.
    New: STD is based on total processing.
        It has terms for each resource type.
        The notion of "runnable resource share" is specific to a type.
    Note: the notion of "resource share fraction" appears in
        a couple of other places:
        - it's passed to apps in app_init_data.xml
        - it's passed in scheduler requests.
        It should be broken down by resource type in these cases too.
        Note to self: do this later.

svn path=/trunk/boinc/; revision=19762
2009-12-02 03:41:52 +00:00
David Anderson e86584f6cc - client: the weight of GPU debt in computing total debt should be
(estimated throughput of all GPUs)/(estimated throughput of all CPUs)
    rather than the ratio of 1 GPU to 1 CPU.
    This change will hopefully cause ratios of granted credit
    to more closely match resource shares.

svn path=/trunk/boinc/; revision=19311
2009-10-16 02:48:55 +00:00
David Anderson a7b32b486e - client: fix crash with <ncpus>0</ncpus>
svn path=/trunk/boinc/; revision=19208
2009-09-29 02:12:35 +00:00
David Anderson 71c7e7a74b - client/scheduler/web: add per-project preferences for whether
to accept CPU, NVIDIA and ATI jobs.
    These prefs are shown only where relevant:
    e.g., only for processor types for which the project has app versions,
    and if it has versions for only one type, no pref is shown.

    These prefs affect both client and scheduler.
    The client won't ask for work for a device blocked by prefs,
    and the scheduler won't send it.

    This replaces earlier optional project-specific prefs for
    "no CPU jobs" and "no GPU jobs".
    (However, these prefs continue to be honored on the server side).

- client: if NVIDIA driver is unknown, say that rather than 0


svn path=/trunk/boinc/; revision=19194
2009-09-28 04:24:18 +00:00
David Anderson 41e3b06b23 - client and scheduler RPC: add optional <cpu_backoff>, <cuda_backoff>,
and <ati_backoff> elements to scheduler reply.
    These specify backoffs for the resource types,
    overriding the existing backoff mechanism.
    Projects can supply these if they don't have apps of a particular type
    and don't want to get periodic requests for them.

svn path=/trunk/boinc/; revision=19059
2009-09-16 17:34:19 +00:00
David Anderson 42e8d1137d svn path=/trunk/boinc/; revision=19058 2009-09-16 16:54:42 +00:00
David Anderson 4c52989f59 - client: improve the estimation of "busy time" (see 17 July checkin).
If you have 2 CPUs and a 1-day job in EDF mode,
        the busy time should be zero, not .5 days.

        Add a class BUSY_TIME_ESTIMATOR that makes a somewhat better
        (though still fairly crude) estimate.

svn path=/trunk/boinc/; revision=19003
2009-09-03 20:31:04 +00:00
David Anderson 112cec62a5 - client: fix to [18945]; we only want to max the overall request
with a GPU request if project is anonymous platform
    AND it has an app for that GPU type
- client: report overall work request as well as per-resource-type requests

svn path=/trunk/boinc/; revision=18994
2009-09-02 21:36:25 +00:00
David Anderson 29c1751898 - client: if project is anonymous platform, set the overall work req
to the max of the requests for different resource types.
    Otherwise projects with old schedulers won't send us work.

svn path=/trunk/boinc/; revision=18945
2009-08-31 03:42:01 +00:00
David Anderson c3fe504e1d - client: add ATI support to job scheduling and work fetch
svn path=/trunk/boinc/; revision=18850
2009-08-17 16:50:40 +00:00
David Anderson e606170b14 - client: try to fix situations where the scheduler
runs GPU jobs in a seemingly random order,
        or preempts GPU jobs needlessly.
        The change has two parts:
        1) sort the "results" vector by received_time,
            so that the RR simulation processes GPU jobs FIFO.
        2) in the CPU scheduler (earliest_deadline_result())
            instead of choosing the earliest-deadline GPU job that
            misses its deadline,
            pick the earliest_deadline GPU from a project that
            has a deadline miss for that GPU type
            (this is what's done in the CPU case)
    - client: fix bug where if you have an exclusive app,
        then remove it from cc_config.xml and do "update config",
        it doesn't go away.
        Need to clear the list before parsing.

svn path=/trunk/boinc/; revision=18842
2009-08-14 16:54:45 +00:00
David Anderson 5753153909 - client: 2nd try on my last checkin.
We need to estimate 2 different delays for each resource type:
    1) "saturated time": the time the resource will be fully utilized
        (new name for the old "estimated delay").
        This is used to compute work requests.
    2) "busy time": the time a new job would have to wait
        to start using this resource.
        This is passed to the scheduler and used for a crude deadline check.
        Note: this is ill-defined; a single number doesn't suffice.
        But as a very rough estimate, I'll use the sum of
            (J.duration * J.ninstances)/ninstances
        over all jobs that miss their deadline under RR sim.

svn path=/trunk/boinc/; revision=18629
2009-07-17 18:29:10 +00:00
David Anderson 8a1c0816ed - client: change the way a resource's "estimated delay"
(passed to server for crude deadline check) is computed.
    Old: estimated delay is the interval for which the resource
        is fully used (i.e., all instances busy).
    Problem: this may cause unnecessary project starvation.
        example: 1 CPU machine, has a month-long CPDN job
        with a 1-year deadline (it's not in deadline trouble).
        Then the CPU estimated delay will be 1 month,
        and the client won't get any work from projects
        with deadlines shorter than 1 month.
    New: estimated delay is the latest time at which the
        resource is fully used and is being used by at least 1 job
        that is projected to miss its deadline under RR.

    Note: this isn't precise, but I don't think we can improve it
    much without getting a lot more complex.


svn path=/trunk/boinc/; revision=18607
2009-07-16 21:21:47 +00:00
David Anderson cf638ae3a6 - client: instead of scheduling coproc jobs EDF:
- first schedule jobs projected to miss deadline in EDF order
    - then schedule remaining jobs in FIFO order
    This is intended to reduce the number of preemptions of coproc jobs,
    and hence (since they are always preempted by quit)
    to reduce the wasted time due to checkpoint gaps.
- client: the CPU scheduling policy made use of the number
    of deadline misses in various places.
    This should include only the deadline misses of CPU jobs.
    So move "deadlines_missed" from RR_SIM_STATUS and PROJECT
    to RSC_PROJECT_WORK_FETCH so that we have separate counts
    for CPU and coproc jobs, and use the count for CPU jobs.
- GUI RPC: removed the rr_sim_deadlines_missed field
    from project descriptor.
    This is no longer meaningful, and it didn't seem to be used anywhere.

svn path=/trunk/boinc/; revision=17785
2009-04-10 19:01:38 +00:00
David Anderson 837d3fc0a1 - get_project_config.php: include plan classes in platform list;
i.e., list both win/x86 and win/x86 + NVIDIA.
    This will allow the manager to show which projects can
    use the hosts's coprocessors,
    and also graying out projects that require an absent coproc.
- fix compile warnings

svn path=/trunk/boinc/; revision=17735
2009-04-03 21:55:26 +00:00
David Anderson 7e256c0995 - client: work fetch: in RR sim, keep track of the number
of device instances used by jobs that miss deadline.
    Don't do "variety" work fetch if this is >= # of instances

svn path=/trunk/boinc/; revision=17631
2009-03-19 16:55:04 +00:00
David Anderson 88a4482894 - client: consider fetching work from overworked projects
if resource is saturated for < work_buf_min()
    (rather than saturated for 0).
    So now the only significance of "overworked" is that we
    won't ask overworked projects for work if resource is saturated
    more than work_buf_min() but less than work_buf_total()

svn path=/trunk/boinc/; revision=17620
2009-03-18 15:53:02 +00:00
David Anderson 3709c1e9f4 - scheduler: include driver version in the CUDA description string
storing in the database;
- web: display the above

svn path=/trunk/boinc/; revision=17341
2009-02-24 00:06:45 +00:00
David Anderson 125c90d1da - client: work-fetch bug fix: if we're fetching work for a starved
project, it most have no runnable jobs for ANY resource.
- client: work-fetch bug fix: when setting requests in the
    shortfall case, don't request anything if project is backed off
    or overworked for the resource.

svn path=/trunk/boinc/; revision=17338
2009-02-23 21:34:13 +00:00
David Anderson f257101d36 - client: fix work-fetch bug that caused infinite fetch;
cleanup/reorganization of work fetch logic

svn path=/trunk/boinc/; revision=17337
2009-02-23 20:35:52 +00:00
David Anderson 3b31a9d803 - client: remove the "debt repair" mechanism added earlier today.
There are situations where multiple projects can legitimately
    have large negative LTD on a uniprocessor.
    Instead...
- client: add <zero_debts> option to cc_config.xml

svn path=/trunk/boinc/; revision=17328
2009-02-20 22:16:03 +00:00
David Anderson 6241fff21f - client: new work-fetch policy:
1) if an instance is idle, get work from highest-debt project,
        even if it's overworked.
    2) if resource has a shortfall, get work from highest-debt
        non-overworked project
    3) if there's a fetchable non-overworked project with no runnable jobs,
        get from from the highest-debt one.
    (each step is done first for GPU, then CPU)
    Clause 3) is new.
    It will cause the client to get jobs for as many projects as possible,
    even if there is no shortfall.
    This is necessary to make the notion of "overworked" meaningful
    (otherwise, any project with long jobs can become overworked).
    It also maintains as much variety as possible (like pre-6.6 clients).

    Also (small bug fix) if a project is overworked for resource R,
    request work for R only in case 1).


svn path=/trunk/boinc/; revision=17327
2009-02-20 21:44:39 +00:00
David Anderson f7f2f85b79 - client: if a project is at max backoff for a resource,
stop accumulating debt if it's at or around zero.
        This prevents other projects from being driven unboundedly negative.
    - client: if the number of overworked projects exceeds the number
        of device instances, clear debts; this indicates that an earlier
        client was buggy and produced bad debt values.

svn path=/trunk/boinc/; revision=17325
2009-02-20 18:37:27 +00:00
David Anderson a4a2a68f7d - fix tabs
svn path=/trunk/boinc/; revision=17101
2009-02-02 18:47:34 +00:00
David Anderson 9f170696a4 - client: code cleanup
svn path=/trunk/boinc/; revision=17100
2009-02-02 18:45:00 +00:00
David Anderson 6120b02306 - client: code cleanup
svn path=/trunk/boinc/; revision=17098
2009-02-02 05:15:12 +00:00
David Anderson 89188fca84 - client: there was a problem with how the round simulator
worked in the presence of coprocessors.
    The simulator maintained per-project queues of pending jobs.
    When a job finished (in the simulation) it would get
    one or more jobs from that project's pending queue.

    The problem: this could cause "holes" in the scheduling of GPUs,
    and produce an erroneous nonzero shortfall for GPUs,
    leading to infinite work fetch.

    The solution: maintain a separate (per-resource, not per--project)
    queue of pending coprocessor jobs.
    When a coprocessor job finishes,
    start pending jobs from the queue for that resource.

    Another change: the simulator did strict reservation of coprocessors.
    If there are 2 instances of CUDA,
    and a 1-instance job is running in the simulation,
    it wouldn't start an additional 2-instance job.
    This also can cause erroneous nonzero shortfalls.

    So instead, schedule coprocessors like CPUs, i.e. saturate them.
    This can cause distorted completion time estimates,
    but it's better than infinite work fetch.

svn path=/trunk/boinc/; revision=17093
2009-02-01 04:37:19 +00:00
David Anderson b7a2c227ca - Work fetch / scheduler:
There are two mechanisms to prevent the scheduler from
    sending jobs that won't finish by their deadline.
    Simple mechanism:
        The client sends the interval x for which CPUs are projected
        to be saturated.
        Given a job with estimated duration y,
        the scheduler doesn't send it if x + y exceeds the delay bound.
        If it does send it, x is incremented by y.
    Complex mechanism:
        Client sends workload description.
        Scheduler does EDF simulation, sees if deadlines are missed.
        The only project using this AFAIK is BOINC alpha test.
    Neither of these mechanisms takes coprocessors into account,
    and as a result jobs could be sent that are doomed to
    miss their deadline.
    This checkin adds coprocessor awareness to the Simple mechanism.

    Changes:
    Client:
        compute estimated delay (i.e. time until non-saturation)
        for coprocessors as well as CPU.
        Send them in scheduler request as part of coproc descriptor.
    Scheduler:
        Keep track of estimated delays separately for different resources
- client: fixed bug that computed CPU estimated delay incorrectly
- client: the work request (req_secs) for a resource is the min
    of the project's share and the shortfall.

svn path=/trunk/boinc/; revision=17086
2009-01-30 21:25:24 +00:00
David Anderson 195c0403b7 - scheduler: don't count host as reliable if avg_turnaround is zero
- client: restore notion of overworked;
    if a project is overworked for a resource R,
    don't fetch work for R unless there are idle instances

svn path=/trunk/boinc/; revision=17057
2009-01-28 04:58:01 +00:00
David Anderson 51b468dfc8 - client: remove the "deadlines_missed" and "overworked"
clauses from RSC_WORK_FETCH::choose_project()


svn path=/trunk/boinc/; revision=17056
2009-01-27 23:40:42 +00:00
David Anderson dfa1996975 - scheduler: in get_app_version(), if we previously sent a CUDA app,
but we don't need to send any more CUDA jobs,
    delete the BEST_APP_VERSION record and look for another app version.
    This lets the scheduler send both CUDA and CPU app versions
    for a given app in a single RPC.


svn path=/trunk/boinc/; revision=17051
2009-01-27 21:18:06 +00:00
David Anderson 574d1fe087 - client: don't request work for a resource if it has no shortfall.
- client and server: get rid of coproc_cuda global.

svn path=/trunk/boinc/; revision=17019
2009-01-26 05:00:49 +00:00