Commit Graph

69 Commits

Author SHA1 Message Date
David Anderson a151ad6cb3 - client/scheduler: deal with situation where GPU has enough
RAM to run job, but when we actually run the job
    not enough GPU RAM is free, so the application fails.
    This can cause a large number of jobs to fail.
    Solution:
    - app_plan() can specify the GPU RAM requirements of an app version.
        This is passed to the client in a new field
        <gpu_ram> of the <app_version> element.
    - prior to starting or restarting a GPU app, the client
        checks the amount of free RAM on the particular GPU.
        If it's not enough for the app version,
        the client doesn't start it,
        and arranges for the scheduler to ignore it for 5 minutes
        (by which point there might be more free GPU RAM)
    Notes:
    1) this change will have effect only when
        both client and scheduler are updated.
    2) the check is done in enforce_schedule(),
        rather than schedule_cpus(),
        because only at that point
        have we assigned a specific GPU to the job.
    3) there's another case to deal with:
        a GPU app's malloc of GPU RAM fails in the middle of the job.
        Currently the job fails.
        I plan to add an API call boinc_temporary_exit(x) so
        that the job can exit and potentially restart in x seconds.
        (In principle this mechanism is sufficient for all cases,
        but it could lead to a lot of starting/exiting,
        so the current change is worthwhile).

svn path=/trunk/boinc/; revision=19864
2009-12-11 22:45:59 +00:00
David Anderson e27659858d - result of code shuffle: the HOST_INFO structure returned
by the get_host_info() GUI RPC now contains GPU info

svn path=/trunk/boinc/; revision=19798
2009-12-07 06:13:17 +00:00
David Anderson b70229c093 - code shuffle: move client-specific GPU code to a separate file
svn path=/trunk/boinc/; revision=19794
2009-12-07 00:42:03 +00:00
David Anderson 4bf2ef5198 - client: add new config options:
<ignore_cuda_dev>n</ignore_cuda_dev>
    <ignore_ati_dev>n</ignore_ati_dev>
    to ignore (not use) specific NVIDIA or ATI GPUs.
    You can ignore more than one.

svn path=/trunk/boinc/; revision=19566
2009-11-12 23:44:49 +00:00
David Anderson fe2a18f282 - client/scheduler: standardize the FLOPS estimate between NVIDIA and ATI.
Make them both peak FLOPS,
    according to the formula supplied by the manufacturer.

    The impact on the client is minor:
    - the startup message describing the GPU
    - the weight of the resource type in computing long-term debt

    On the server, I changed the example app_plan() function
    to assume that app FLOPS is 20% of peak FLOPS
    (that's about what it is for SETI@home)

svn path=/trunk/boinc/; revision=19310
2009-10-16 00:13:01 +00:00
David Anderson d6efa7dabb - client: address the situation where GPUs become unusable
for certain periods (e.g. when Remote Desktop is used on Win).
    - add is_usable() member function to COPROC.
        Currently this just calls the respective (CUDA or CAL)
        initialization function.
        We need to check whether this works and/or causes problems.
    - in enforce_schedule(), check whether usability has changed
        for each GPU type.
        If we've gone from usable to unusable,
        flag all jobs for that GPU as coproc_missing
        (so they won't get run, and will quit if they're running).
        If we've gone from unusable to usable, clear the flag.
    This should deal with all cases except where
    the client is started up with GPUs unusable.
- scheduler: more query optimizations for locality scheduling
    (from Oliver Bock)

svn path=/trunk/boinc/; revision=19301
2009-10-14 18:07:49 +00:00
David Anderson fca2cb8016 - client: restore calDeviceGetInfo(), add its info to COPROC_ATI struct
(some plan class might need to know this).
    Code cleanup.

svn path=/trunk/boinc/; revision=19234
2009-10-02 22:58:03 +00:00
Rom Walton ad455ab09d - client: Add support for checking for both amd* prefixed CAL libraries
and ati* prefixed CAL libraries.
    - scheduler: redefine ati class plans again.
        ati: CAL 1.0+, amd* prefixed libraries
        ati13amd: CAL 1.3+, amd* prefixed libraries
        ati13ati: CAL 1.3+, ati* prefixed libraries
        ati14: CAL 1.4+, ati* prefixed libraries

    sched/
        sched_customize.cpp
    lib/
        coproc.cpp, .h

svn path=/trunk/boinc/; revision=19162
2009-09-25 15:40:16 +00:00
David Anderson 39815033a3 - client: in GPU enumeration, separate warning msgs from GPU descriptions.
Show warning msgs only if log_flags.coproc_debug

svn path=/trunk/boinc/; revision=19153
2009-09-24 17:23:33 +00:00
David Anderson f5a6f862bf - client: fix bug in RR simulation:
start only enough jobs to fill CPUs per project,
    not all the CPU jobs at once.
    I'm not sure how much difference this makes,
    but this is how it's supposed to work.
- client: if app_info.xml doesn't specify flops,
    use an estimate that takes GPUs into account.
- client: if it's been more than 2 weeks since time stats update,
    don't decay on_frac at all.

svn path=/trunk/boinc/; revision=19035
2009-09-09 22:18:02 +00:00
David Anderson b129e71f20 - client: add code for faking ATI GPUs
svn path=/trunk/boinc/; revision=19024
2009-09-08 18:42:24 +00:00
David Anderson 2039e67638 - client: NVIDIA offers an API which tells you whether a GPU
is running a graphics application.
    Change the semantics of the "don't use GPU while computer in use" pref
    to "don't use a GPU that's running a graphics app while
    computer is in use".
    This will increase GPU utilization on multi-GPU systems.

svn path=/trunk/boinc/; revision=18942
2009-08-28 22:55:04 +00:00
David Anderson 9a8f91fb1e - client: in parsing <coproc> elements in <app_version>,
use a new type COPROC_REQ for which the count field is a double.
    Otherwise fractional GPU jobs don't work.

svn path=/trunk/boinc/; revision=18906
2009-08-24 23:16:17 +00:00
David Anderson f8977c52e7 - fixes to coproc stuff
svn path=/trunk/boinc/; revision=18881
2009-08-19 23:47:07 +00:00
David Anderson f1360e5971 - client: finish the implementation of fractional coproc jobs.
- different data structure for keeping track of coproc usage;
        instead of COPROC having per-instance pointers to ACTIVE_TASK,
        ACTIVE_TASK now has an array of device number indices
        for each instance that it's using.
    - in enforce_schedule(), we call a new function assign_coprocs()
        that decides what coproc instances each job will use,
        and prunes jobs for which we can't get an assignment.
        This function embodies lots of subtlety.
    - coproc_cmdline() no longer deals with reserving instances;
        it just has to generate the --device X cmdline

svn path=/trunk/boinc/; revision=18880
2009-08-19 23:21:55 +00:00
David Anderson 091dba7a65 svn path=/trunk/boinc/; revision=18874 2009-08-19 20:33:04 +00:00
David Anderson 073e6ded2c - client and scheduler: lay the groundwork for "fractional coproc jobs",
e.g. the Milkyway@home ATI app, of which we can typically run
    2 or 3 instances at once on a GPU.
    Changes include:
    - In APP_VERSION, don't use a COPROCS to represent the GPU
        requirements; just use doubles ncudas and natis.
    - sufficient_coprocs() etc. are no longer members of COPROCS
    - in HOST_USAGE, ncudas and natis are doubles
    - in scheduler request, req_instances is now a double

    This checkin doesn't include the job scheduling logic,
    i.e. assigning jobs to GPUs.  That will follow.

svn path=/trunk/boinc/; revision=18868
2009-08-19 18:41:47 +00:00
David Anderson 152ee20b17 - client: fix calculation of ATI flops
svn path=/trunk/boinc/; revision=18852
2009-08-17 17:27:06 +00:00
David Anderson c3fe504e1d - client: add ATI support to job scheduling and work fetch
svn path=/trunk/boinc/; revision=18850
2009-08-17 16:50:40 +00:00
David Anderson 8df1e1ebb3 - client: ATI tweaks
svn path=/trunk/boinc/; revision=18849
2009-08-16 04:02:11 +00:00
David Anderson 4eb7097653 compile fixes
svn path=/trunk/boinc/; revision=18848
2009-08-15 00:12:51 +00:00
David Anderson 3b03707efa - client: clean up ATI code and make it work (or at least compile)
under Linux

svn path=/trunk/boinc/; revision=18847
2009-08-15 00:00:57 +00:00
David Anderson 602ad0b5b7 - client: ATI GPU detection code (from Crunch3r)
svn path=/trunk/boinc/; revision=18846
2009-08-14 22:54:34 +00:00
David Anderson 94e75fd4b1 svn path=/trunk/boinc/; revision=18770 2009-07-29 21:21:52 +00:00
David Anderson e3a730c334 - client: add <use_all_gpus> config option. If set, use GPUs
even if they're not equivalent to the most capable one.
- Validator: fix one_pass_N_WU option.

svn path=/trunk/boinc/; revision=17896
2009-04-27 23:51:46 +00:00
David Anderson 5adb25381d - client: new approach to handling multiple GPUs.
old: find fastest GPU, and pretend that others are the same.
            Problem: other GPUs might be less capable,
            and not able to handle jobs sent by server.
        new: find the most "capable" GPU, use others that are equivalent,
            don't use those that are not.
            "Capable" is defined by
            - compute capability (i.e., hardware version)
            - driver version
            - memory size
            - FLOPs
            in that priority order.
        See comments in lib/coproc.h

svn path=/trunk/boinc/; revision=17855
2009-04-22 02:09:53 +00:00
David Anderson 90f863f08c - partial checkin so I can edit locally (bad network connection)
svn path=/trunk/boinc/; revision=17852
2009-04-21 08:11:28 +00:00
David Anderson c58136e5bf - client: improve CPU sched debug messages
(say what kind of job and why we're scheduling it)
- client: log messages describing GPUs: one line per GPU; fixes #879

svn path=/trunk/boinc/; revision=17847
2009-04-20 00:00:11 +00:00
David Anderson cd4786166a - client: fix crash
svn path=/trunk/boinc/; revision=17550
2009-03-06 23:27:19 +00:00
David Anderson e1b94a1e53 - client: add a new mechanism for assigning coproc instances to tasks,
and passing them the corresponding --device N cmdline args.
    This fixes a bug introduced in 17402 (Feb 26)
    that broke the --device feature,
    presumably causing problems on systems with multiple GPUs.

svn path=/trunk/boinc/; revision=17549
2009-03-06 23:10:45 +00:00
David Anderson c22b62f25b - scheduler: fix bugs in support for anonymous platform + coprocs
(app versions don't have a <coprocs> around coproc elements,
    may an oversight but let's stick with it).
    Anyway, I think it's working now.
- lib: remove "owner" array from COPROC.
    This was used in client to keep track of assignment of
    coprocessors to tasks, but we got rid of the reserve/free scheme.
    NOTE: this breaks the mechanism for passing --device N to apps;
    I'll have to do this another way.  Stay tuned.

svn path=/trunk/boinc/; revision=17543
2009-03-06 22:21:47 +00:00
David Anderson 16ca7cd359 svn path=/trunk/boinc/; revision=17332 2009-02-22 04:05:34 +00:00
David Anderson 4d1544e579 - client: detect NVIDIA driver version number, show it on startup,
and include it with CUDA coprocessor descriptor in request msgs

svn path=/trunk/boinc/; revision=17275
2009-02-16 23:03:03 +00:00
David Anderson b7a2c227ca - Work fetch / scheduler:
There are two mechanisms to prevent the scheduler from
    sending jobs that won't finish by their deadline.
    Simple mechanism:
        The client sends the interval x for which CPUs are projected
        to be saturated.
        Given a job with estimated duration y,
        the scheduler doesn't send it if x + y exceeds the delay bound.
        If it does send it, x is incremented by y.
    Complex mechanism:
        Client sends workload description.
        Scheduler does EDF simulation, sees if deadlines are missed.
        The only project using this AFAIK is BOINC alpha test.
    Neither of these mechanisms takes coprocessors into account,
    and as a result jobs could be sent that are doomed to
    miss their deadline.
    This checkin adds coprocessor awareness to the Simple mechanism.

    Changes:
    Client:
        compute estimated delay (i.e. time until non-saturation)
        for coprocessors as well as CPU.
        Send them in scheduler request as part of coproc descriptor.
    Scheduler:
        Keep track of estimated delays separately for different resources
- client: fixed bug that computed CPU estimated delay incorrectly
- client: the work request (req_secs) for a resource is the min
    of the project's share and the shortfall.

svn path=/trunk/boinc/; revision=17086
2009-01-30 21:25:24 +00:00
David Anderson 574d1fe087 - client: don't request work for a resource if it has no shortfall.
- client and server: get rid of coproc_cuda global.

svn path=/trunk/boinc/; revision=17019
2009-01-26 05:00:49 +00:00
David Anderson f90dddc9a6 - client: clamp long term debts tp +- 1 week
- client: fix CUDA debt calculation
- client: don't accumulate debt if project->dont_request_more_work
- client: improves messages

svn path=/trunk/boinc/; revision=16909
2009-01-14 23:56:07 +00:00
David Anderson 377545a056 - scheduler: if we're not sending work because of the user's "no GPUs" pref,
tell them so.
- scheduler: fix bug that caused no CUDA jobs to be sent

svn path=/trunk/boinc/; revision=16893
2009-01-12 23:47:52 +00:00
David Anderson 2cc81a97d5 - scheduler: initialize COPROC fields
svn path=/trunk/boinc/; revision=16891
2009-01-12 23:08:16 +00:00
David Anderson 8740ffdc94 - client: more work-fetch stuff.
No more per-project shortfall.
    It's getting pretty close.

svn path=/trunk/boinc/; revision=16765
2009-01-03 06:01:17 +00:00
David Anderson 8c591e31df - client: first whack at new work-fetch logic. Very preliminary.
svn path=/trunk/boinc/; revision=16754
2008-12-31 23:07:59 +00:00
David Anderson fae0903c0f - scheduler: store CUDA total memory as a double,
since it can be 4GB or larger


svn path=/trunk/boinc/; revision=16737
2008-12-22 22:12:57 +00:00
David Anderson 4a65681176 - scheduler: if client has coprocs,
put a textual summary of them in host.serialnum (currently unused)
- web: show coprocs on host detail page
- db_dump: include coproc info in host XML

svn path=/trunk/boinc/; revision=16697
2008-12-16 18:46:28 +00:00
David Anderson b3bc71047e - client, CUDA detection:
1) report all devices found
    2) use the specs of the fastest one

svn path=/trunk/boinc/; revision=16669
2008-12-11 21:44:22 +00:00
David Anderson 63a81014fe - line endings
svn path=/trunk/boinc/; revision=16176
2008-10-09 19:06:01 +00:00
David Anderson 6070af4fea - client: fix bugs in coprocessor scheduling;
add new <coproc_debug> log flag

svn path=/trunk/boinc/; revision=16122
2008-10-03 21:55:34 +00:00
David Anderson a4d5d49b28 - client: attempt to fix CPU sched bug in the presence of GPUs
(if there was an idle GPU, it would run unboundedly many CPU jobs)

svn path=/trunk/boinc/; revision=16043
2008-09-25 01:04:53 +00:00
Eric J. Korpela 40e243412d - Fixed fcgi builds to use an installed version of fcgi_stdio.h rather than
a modified boinc version.
    - Added new header "boinc_fcgi.h" to be used instead of "fcgi_stdio.h".
      This header defines I/O functions in the namespace FCGI rather than using
      redefined functions the way "fcgi_stdio.h" does.  This was causing a lot
      of headaches when both <cstdio> and "fcgi_stdio.h" was called.  Using
      overloaded functions fixes this problem, except when the only difference
      between functions is the return type (for example ::fopen() returns FILE*
      and FCGI::fopen() returns FCGI_FILE*).
    - Fixed some missing "#ifdef _WIN32" blocks in filesys.C



svn path=/trunk/boinc/; revision=15984
2008-09-09 19:10:42 +00:00
Rom Walton 481e45a50a - client: Both Windows x86 and Windows x64 CUDA Runtime libraries
should be 2.0.  This avoids crashes related to data structure
        changes in the Runtime.
        
    coprocs/CUDA/mswin/Win32/Debug/bin/
        cudart.dll
    coprocs/CUDA/mswin/Win32/Release/bin/
        cudart.dll
    coprocs/CUDA/mswin/Win32/ReleaseSigned/bin/
        cudart.dll
    coprocs/CUDA/mswin/x64/Debug/bin/
        cudart.dll
    coprocs/CUDA/mswin/x64/Release/bin/
        cudart.dll
    coprocs/CUDA/mswin/x64/ReleaseSigned/bin/
        cudart.dll
    lib/
        coproc.C, .h

svn path=/trunk/boinc/; revision=15925
2008-08-22 22:15:08 +00:00
David Anderson 87cf35f89b - client: fix CPU scheduling logic related to coprocessors
Old: when checking whether an app can be run,
        check for sufficient coprocessors relative to
        the current coprocessor usage.
        Bug: it there are 2 CUDA jobs,
        the scheduler will decide to run both.
        enforce_scheduler() will only be able to run one,
        and the other CPU will be idle.
    New: include coprocessor usage (along with RAM and CPUs)
        in the check, and do a simulated reservation.
        In the above scenario, the scheduler will select
        one CUDA app and one non-CUDA app.

svn path=/trunk/boinc/; revision=15904
2008-08-20 17:34:18 +00:00
David Anderson 4f66bb4c95 - added copyright and license info to .C, .cpp, .h files
- scheduler: fix bug in adaptive replication:
    if send an unreplicated job to untrusted host,
    set both wu.target_nresults and wu.min_quorum to app.target_nresults.

svn path=/trunk/boinc/; revision=15762
2008-08-06 18:36:30 +00:00