Commit Graph

282 Commits

Author SHA1 Message Date
David Anderson 9187cb52ba - client and scheduler RPC:
Add more info to "project in-progress job list".
    Old: entries included only job name and app plan class;
        this was used to resend lost jobs,
        and to count the # of CPU and GPU jobs.
        But it's not usable e.g. for per-app in-progress limits.
    New: send the client's app versions (including usage info)
        and for each in-progress job, which app version it uses.
        (This reduces request-message size compared with sending
        usage info and app name per job).
- client and scheduler RPC:
    Add more info to "all in-progress job list", and make it optional.
    This list is used by schedulers that do deadline checks
    using EDF workload simulation.
    Old: the list is always sent, and it contains no info
        about job resource usage
    New: the list is sent only if the scheduler asked for it
        in a previous reply,
        and each entry now contains resource usage (CPU, GPUs)
    Note: the scheduler's EDF simulator is outdated;
        it doesn't know about GPU jobs.
        But we may as well get the info in place.


svn path=/trunk/boinc/; revision=21513
2010-05-13 20:18:27 +00:00
David Anderson 678d880c64 - client: clean up logic related to GPU available memory.
If a driver call to get available mem fail, mark the GPU as unusable.


svn path=/trunk/boinc/; revision=21210
2010-04-19 18:35:10 +00:00
David Anderson 01402bb45a - client: improve GPU scheduling
old: assign GPUs, then check available RAM
        Problem: may cause starvation on multi-GPU systems.
    new: use available RAM info in the assignment process.
        Prevents starvation, also reduces the number of driver calls.

svn path=/trunk/boinc/; revision=21205
2010-04-18 03:00:33 +00:00
David Anderson 84861e7c55 - client: don't include graphics apps in non-BOINC CPU time
svn path=/trunk/boinc/; revision=21131
2010-04-07 05:54:20 +00:00
David Anderson b0cb81159f - client: when looking for new file xfers to start,
favor those that are partially done
- client: fix crashing bug if a project is detached
    while an RSS feed fetch for it is in progress
- code cleanup: switch from /// back to // for comments
    (so much for doxygen)

svn path=/trunk/boinc/; revision=21041
2010-04-01 05:54:29 +00:00
David Anderson 679fb7132c svn path=/trunk/boinc/; revision=21019 2010-03-29 19:08:19 +00:00
David Anderson 4f77556c74 - client: if a GPU job is blocked on available mem,
don't fetch more jobs for that resource type

svn path=/trunk/boinc/; revision=20817
2010-03-10 06:00:37 +00:00
David Anderson b415b07785 - client: revisit the domino-effect preemption problem.
Removed my changes of 19 Jan 2010, which didn't work.
    Added new mechanism: keep track of whether a job J has ever run in EDF.
    If so, and if another job of the same project and resource type as J
    is marked as deadline miss, then mark J as deadline miss,
    so that it won't get preempted.
- web: change "result" to "task" in server status page
- admin web: show server stable SVN revision, not trunk

svn path=/trunk/boinc/; revision=20805
2010-03-05 21:13:53 +00:00
David Anderson f716dcf7ae - client: if a project has zero resource share,
treat it as a "backup project":
    fetch work from it only if there is an idle instance
    and no other projects have work.


svn path=/trunk/boinc/; revision=20286
2010-01-28 05:21:14 +00:00
David Anderson b5124fe729 - client: brute-force attempt at eliminating domino-effect preemption:
if job A is unstarted and EDF,
    and there's a job B that is later in the list,
    is started, has the same app version,
    and has the same arrival time,
    move A after B.
- client: remove the "temp_dcf" mechanism,
    which had the same goal but didn't work.
- client: in computing overall debt for a project,
    subtract a term that reflects pending work.
    This should reduce repeated fetches from the same project.
- client simulator: tweaks

svn path=/trunk/boinc/; revision=20223
2010-01-21 00:14:56 +00:00
David Anderson 37aae854f3 - client: scheduling problem:
- a project overestimates job FLOP counts
    - the client starts jobs in EDF mode
    - as job progresses and fraction done increases,
        its completion time estimate decreases until
        it's no longer a deadline miss.
    - job gets preempted by other job from that project;
        you end up with lots of partly completed jobs.
    Solution (I hope): if an app version has running jobs,
        compute a "temp DCF" for the app version,
        which is the min of dynamic/static estimates for its jobs.
        Apply this scaling factor to completion time estimates
        for unstarted jobs in RR simulation
- client: the estimation of remaining time of running jobs was wrong
    (how did this bug survive so long?)

svn path=/trunk/boinc/; revision=20077
2010-01-06 06:01:23 +00:00
David Anderson b499654603 - client: more notice stuff. Substantial progress!
We're now saving feed lists, and fetching items from feeds.

svn path=/trunk/boinc/; revision=20021
2009-12-23 00:58:27 +00:00
David Anderson 37ea627866 - Win compile fixes. Also, needed to provide a replacement
for strptime() on Win.  WTF?

svn path=/trunk/boinc/; revision=20003
2009-12-21 19:20:28 +00:00
David Anderson 4e9fc3d595 - client: a big glob of new code related to notices.
Not functional yet.


svn path=/trunk/boinc/; revision=20002
2009-12-21 17:49:28 +00:00
David Anderson a151ad6cb3 - client/scheduler: deal with situation where GPU has enough
RAM to run job, but when we actually run the job
    not enough GPU RAM is free, so the application fails.
    This can cause a large number of jobs to fail.
    Solution:
    - app_plan() can specify the GPU RAM requirements of an app version.
        This is passed to the client in a new field
        <gpu_ram> of the <app_version> element.
    - prior to starting or restarting a GPU app, the client
        checks the amount of free RAM on the particular GPU.
        If it's not enough for the app version,
        the client doesn't start it,
        and arranges for the scheduler to ignore it for 5 minutes
        (by which point there might be more free GPU RAM)
    Notes:
    1) this change will have effect only when
        both client and scheduler are updated.
    2) the check is done in enforce_schedule(),
        rather than schedule_cpus(),
        because only at that point
        have we assigned a specific GPU to the job.
    3) there's another case to deal with:
        a GPU app's malloc of GPU RAM fails in the middle of the job.
        Currently the job fails.
        I plan to add an API call boinc_temporary_exit(x) so
        that the job can exit and potentially restart in x seconds.
        (In principle this mechanism is sufficient for all cases,
        but it could lead to a lot of starting/exiting,
        so the current change is worthwhile).

svn path=/trunk/boinc/; revision=19864
2009-12-11 22:45:59 +00:00
David Anderson 2d4ceb618a - client: my STD-related checkin of Dec 1 was bad.
It computed an "overall STD" as the sum of CPU and coprocs,
    weighted by the coproc's speed, as we do for LTD.
    This was the wrong idea; in the presence of GPUs,
    STDs quickly get pushed to +- 1 day and are truncated there.

    New scheme: STD is maintained per (resource type, project).
    This fixes the above problem,
    and it opens to door to round-robin scheduling of GPUs.
- client: the calculation of "anticipated debt" was scaling
    by relative resource share.
    This wasn't correct, seems to me.
- client: rename "debt" to "long_term_debt" in a few places
    (but not in the client state file, for compatibility)

svn path=/trunk/boinc/; revision=19777
2009-12-03 23:09:25 +00:00
David Anderson 59328aaccb - client: change how short term debt is updated.
Old: it's based entirely on CPU time.
        So a GPU project, whose app uses only a fraction
        of a CPU, accrues positive debt.
        This is OK if the project has only GPU apps,
        since STD is not (currently) used for GPU scheduling.
        But some projects have both CPU and GPU apps.
    New: STD is based on total processing.
        It has terms for each resource type.
        The notion of "runnable resource share" is specific to a type.
    Note: the notion of "resource share fraction" appears in
        a couple of other places:
        - it's passed to apps in app_init_data.xml
        - it's passed in scheduler requests.
        It should be broken down by resource type in these cases too.
        Note to self: do this later.

svn path=/trunk/boinc/; revision=19762
2009-12-02 03:41:52 +00:00
David Anderson 545d137804 - client: no network activity if running CPU benchmarks
svn path=/trunk/boinc/; revision=19375
2009-10-23 21:57:58 +00:00
David Anderson 5e862ac495 - client: on startup, if a coproc needed by a job is missing,
set a "coproc_missing" flag rather than aborting the job.
        If use removes a GPU board while there's a large queue of GPU jobs,
        they'll stay queued (until their deadline passes).

        Note: this doesn't fix the situation where user connects via
        Remote Desktop while GPU jobs are running or queued.
        We should check for Remote Desktop every minute or so, and stop GPU jobs.

svn path=/trunk/boinc/; revision=19287
2009-10-12 16:28:17 +00:00
David Anderson 4ab5685ce4 - client: if a task is running, uses a GPU, and the system has >1 GPU,
append text to its resource string saying which GPU it's using
- manager: tweak Task properties text

svn path=/trunk/boinc/; revision=19240
2009-10-04 02:51:44 +00:00
David Anderson 71c7e7a74b - client/scheduler/web: add per-project preferences for whether
to accept CPU, NVIDIA and ATI jobs.
    These prefs are shown only where relevant:
    e.g., only for processor types for which the project has app versions,
    and if it has versions for only one type, no pref is shown.

    These prefs affect both client and scheduler.
    The client won't ask for work for a device blocked by prefs,
    and the scheduler won't send it.

    This replaces earlier optional project-specific prefs for
    "no CPU jobs" and "no GPU jobs".
    (However, these prefs continue to be honored on the server side).

- client: if NVIDIA driver is unknown, say that rather than 0


svn path=/trunk/boinc/; revision=19194
2009-09-28 04:24:18 +00:00
David Anderson 86ee2f5753 - client: fix bug that caused unstarted coproc jobs to preempt
ones already running.
    The problem: we considered a job as started if it has an ACTIVE_TASK.
    However, we were creating ACTIVE_TASKS for jobs before deciding
    to run them, because we needed a place to store the coproc reservations.
    This caused the above bug, and also had the undesirable effect
    of creating slot directories before they're needed.

    Solution: store coprocessor reservations in RESULT
    rather than ACTIVE_TASK.

svn path=/trunk/boinc/; revision=19129
2009-09-22 21:02:06 +00:00
David Anderson 073e6ded2c - client and scheduler: lay the groundwork for "fractional coproc jobs",
e.g. the Milkyway@home ATI app, of which we can typically run
    2 or 3 instances at once on a GPU.
    Changes include:
    - In APP_VERSION, don't use a COPROCS to represent the GPU
        requirements; just use doubles ncudas and natis.
    - sufficient_coprocs() etc. are no longer members of COPROCS
    - in HOST_USAGE, ncudas and natis are doubles
    - in scheduler request, req_instances is now a double

    This checkin doesn't include the job scheduling logic,
    i.e. assigning jobs to GPUs.  That will follow.

svn path=/trunk/boinc/; revision=18868
2009-08-19 18:41:47 +00:00
David Anderson c3fe504e1d - client: add ATI support to job scheduling and work fetch
svn path=/trunk/boinc/; revision=18850
2009-08-17 16:50:40 +00:00
David Anderson 26114920fe - client: define "too many uploads" (for work fetch) as
2 * max(ncpus, ngpus);
    show this in the state displayed by <work_fetch_debug>
- manager: show project-wide backoff in transfers tab

svn path=/trunk/boinc/; revision=18662
2009-07-22 22:00:51 +00:00
David Anderson e794e71c48 - client: code cleanup for project-level file xfer backoff
svn path=/trunk/boinc/; revision=18601
2009-07-16 16:35:57 +00:00
David Anderson 6a13bd12b8 - client: restored code for project-wide backoff on file
uploads and downloads.
    I originally added this on 30 Sept 2005
    and disabled it 2 weeks later because there were reports of problems.
    However, we need this functionality
    (e.g. on GPU hosts with hundreds of files to upload,
    we need to back off after a few failures, not try all of them).
    I added messages (<file_xfer_debug>) so you can see what's going on.
    Fixes #932.

svn path=/trunk/boinc/; revision=18593
2009-07-10 17:06:06 +00:00
David Anderson 46d9e8f087 - client: record the time results are received.
Process non-EDF GPU jobs in this order.


svn path=/trunk/boinc/; revision=18531
2009-06-30 20:22:54 +00:00
David Anderson 0b3ce504ff - Win: compile fixes
svn path=/trunk/boinc/; revision=18439
2009-06-16 21:58:38 +00:00
David Anderson 16e87bc84e - client: don't require that file upload URLs contain "file_upload_handler".
svn path=/trunk/boinc/; revision=18427
2009-06-16 18:27:16 +00:00
David Anderson f9222339e9 - client: simplify enforce_schedule(), and maybe fix bugs.
New approach: take the "ordered_schedule_results" list,
    add running jobs that haven't finished their time slice,
    and order the result appropriately.
    Then run jobs in order until CPUs are filled.
    Simpler and clearer than the old way.


svn path=/trunk/boinc/; revision=17992
2009-05-04 19:55:59 +00:00
David Anderson cf638ae3a6 - client: instead of scheduling coproc jobs EDF:
- first schedule jobs projected to miss deadline in EDF order
    - then schedule remaining jobs in FIFO order
    This is intended to reduce the number of preemptions of coproc jobs,
    and hence (since they are always preempted by quit)
    to reduce the wasted time due to checkpoint gaps.
- client: the CPU scheduling policy made use of the number
    of deadline misses in various places.
    This should include only the deadline misses of CPU jobs.
    So move "deadlines_missed" from RR_SIM_STATUS and PROJECT
    to RSC_PROJECT_WORK_FETCH so that we have separate counts
    for CPU and coproc jobs, and use the count for CPU jobs.
- GUI RPC: removed the rr_sim_deadlines_missed field
    from project descriptor.
    This is no longer meaningful, and it didn't seem to be used anywhere.

svn path=/trunk/boinc/; revision=17785
2009-04-10 19:01:38 +00:00
David Anderson c6d7076464 - client: for each app version,
keep track of the largest WSS of tasks using it.
    In checking whether tasks fit in RAM,
    use this as an estimate for tasks that haven't started yet.
    This avoids a situation where the client starts a lot of
    tasks in sequence, only to find that each one doesn't fit in RAM.

svn path=/trunk/boinc/; revision=17765
2009-04-09 16:46:03 +00:00
David Anderson dfc62d896d - Manager: show elapsed time instead of CPU time in Task tab.
CPU time is visible in task Properties.
- Manager: in task Properties, show final CPU and elapsed times
    if job is finished
- client: honor backoff for account-manager-requested scheduler RPCs
- client: keep track final elapsed time for results
- GUI RPC: report final elapsed time

svn path=/trunk/boinc/; revision=17588
2009-03-11 22:01:38 +00:00
David Anderson e1b94a1e53 - client: add a new mechanism for assigning coproc instances to tasks,
and passing them the corresponding --device N cmdline args.
    This fixes a bug introduced in 17402 (Feb 26)
    that broke the --device feature,
    presumably causing problems on systems with multiple GPUs.

svn path=/trunk/boinc/; revision=17549
2009-03-06 23:10:45 +00:00
David Anderson e74f93c10d - client: if using anonymous platform, ignore (and complain about)
app versions in scheduler reply
- client: when reporting anonymous platform apps in sched request,
    don't include <file_info>s (not relevant to server)

svn path=/trunk/boinc/; revision=17507
2009-03-05 17:45:36 +00:00
David Anderson 41fe3e40bf - client: tag messages with project where possible; fixes #852
- client: show fetch share rather than run share in wfd message

svn path=/trunk/boinc/; revision=17398
2009-02-26 17:12:55 +00:00
Eric J. Korpela 8f3abcc835 - Added checks for net/*.h, arpa/*.h, netinet/*.h and code to figure out
which of those files to include
    - Modified MAC address check to work on some non-Linux unixes.
      (mac_address.cpp)
    - Added suggested change to "already attached to project" checking.
      (ProjectInfoPage.cpp)
    - changed includes of standard c header files to their c++ equivalents
      (i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
    - replaced "using namespace std;" with more explicit "using std::function" in
      several files.
    - Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
      to the build environment. (boinc_platform.m4,configure.ac)
    - Changed build environment to not use -nostandardlibs unless we are using
      G++ and static linkage is specified. (configure.ac)
    - Added makefiles and package building files for solaris CSW package manager.
    - Fixed bug with attempting to find login name using logname. (configure.ac)
    - Added ifdef HAVE_* protection around some include files commonly found in
      sys.
    - Added support for unified binary for x86_64/i686-pc-solaris.
      (cs_platforms.cpp)
    - generate_host_cpid() now uses MAC address on non-linux unix.
      (hostinfo_network.cpp)
    - Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
      compilers. (boinc_set_compile_flags.m4)
    - Library compiles no longer depend upon the library extension or require
      the library to be prefixed with lib.
    - More fixes for fcgi builds.
    - Added declaration of "struct ether_addr" and ether_ntoa().  Have not yet
      implemented ether_ntoa() for machines that don't have it, or where it is
      buggy.  (unix_util.h)
    - Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
    - Fixed library Makefiles so that all required headers get installed.


svn path=/trunk/boinc/; revision=17388
2009-02-26 00:23:23 +00:00
David Anderson 258dac62b2 - client: it the state file or an RPC reply has an app version
using a coprocessor we don't know about, ignore it
    (and all results using that app_version will be flushed).
    This deals with the situation where we have some GPU jobs,
    but the GPU card is removed (previously this resulted in a crash).
    This requires some code shuffling so that we check for coprocessors
    before reading state file.


svn path=/trunk/boinc/; revision=17161
2009-02-06 00:22:21 +00:00
David Anderson 89188fca84 - client: there was a problem with how the round simulator
worked in the presence of coprocessors.
    The simulator maintained per-project queues of pending jobs.
    When a job finished (in the simulation) it would get
    one or more jobs from that project's pending queue.

    The problem: this could cause "holes" in the scheduling of GPUs,
    and produce an erroneous nonzero shortfall for GPUs,
    leading to infinite work fetch.

    The solution: maintain a separate (per-resource, not per--project)
    queue of pending coprocessor jobs.
    When a coprocessor job finishes,
    start pending jobs from the queue for that resource.

    Another change: the simulator did strict reservation of coprocessors.
    If there are 2 instances of CUDA,
    and a 1-instance job is running in the simulation,
    it wouldn't start an additional 2-instance job.
    This also can cause erroneous nonzero shortfalls.

    So instead, schedule coprocessors like CPUs, i.e. saturate them.
    This can cause distorted completion time estimates,
    but it's better than infinite work fetch.

svn path=/trunk/boinc/; revision=17093
2009-02-01 04:37:19 +00:00
David Anderson be177ee7a4 - client: clear debts when reset project
- client: respect work-fetch backoff for non-CPU-intensive projects
	- client: for non-CPU-intensive project, fetch new job
		if no currently running jobs
	- client: skip non-CPU-intensive projects in debt calculations
	- manager: show resource backoff times correctly

svn path=/trunk/boinc/; revision=16998
2009-01-23 18:29:28 +00:00
David Anderson 8740ffdc94 - client: more work-fetch stuff.
No more per-project shortfall.
    It's getting pretty close.

svn path=/trunk/boinc/; revision=16765
2009-01-03 06:01:17 +00:00
David Anderson 8c591e31df - client: first whack at new work-fetch logic. Very preliminary.
svn path=/trunk/boinc/; revision=16754
2008-12-31 23:07:59 +00:00
David Anderson cd4ca5fb17 - client: fix calculation of a job's FLOPS rate in round-robin simulation
svn path=/trunk/boinc/; revision=16662
2008-12-09 20:01:01 +00:00
David Anderson 79fb6e969e - Remove the notion of "CPU efficiency" from both client and server.
This wasn't being measured correctly for coproc/multithread apps,
    and its effect is now subsumed in DCF.

svn path=/trunk/boinc/; revision=16610
2008-12-03 19:50:06 +00:00
David Anderson 89548f04da - client: compute duration_correction_factor based on elapsed time, not CPU time
(otherwise it doesn't work for coproc or multi-proc apps)
    - client: in estimate of job completion time,
        weight the estimate based on fraction done more heavily
        (quadratic rather than linear)

svn path=/trunk/boinc/; revision=16603
2008-12-02 22:19:39 +00:00
David Anderson 84f1193a9d - client: use FLOPs, rather than CPU time,
as the basis for estimating job completion times.
    This should improve estimates for GPU apps,
    and prevent the DCF from getting messed up.

svn path=/trunk/boinc/; revision=16598
2008-12-02 03:58:32 +00:00
David Anderson 719921bfaf - client: fix the updating of CPU time left in RR simulation;
don't print msgs about non-CPU-intensive projects.

svn path=/trunk/boinc/; revision=16386
2008-11-01 21:10:08 +00:00
David Anderson 2d1d47de15 - client: move round-robin simulation to its own file
- web: check for profile existence before trying to show it
- file deleter: add some debugging msgs

svn path=/trunk/boinc/; revision=16338
2008-10-28 21:59:25 +00:00
David Anderson a4380ee9a6 - web: make some things in sample front page translatable.
TODO: make them all translatable.
- manager: compile fix for Linux

svn path=/trunk/boinc/; revision=16207
2008-10-14 21:40:14 +00:00