Commit Graph

2002 Commits

Author SHA1 Message Date
David Anderson 8f84416ab7 - scheduler: add log messages to show VM-related request items
- fix typos in db_update script


svn path=/trunk/boinc/; revision=25183
2012-02-02 00:18:42 +00:00
David Anderson 480e28b54c - web: fix the user search feature
- scheduler: parse d_project_share
- scheduler: if vbox and vbox_mt are both available,
    use vbox for a 1-CPU machine


svn path=/trunk/boinc/; revision=25176
2012-02-01 03:30:14 +00:00
David Anderson 130d6ed4f0 - server: revamp the "assigned job" mechanism.
This now supports two main use cases:
    1) there's a job that you want to run once on all hosts,
        present and future
        (or all hosts belonging to a user, or to a team).
        The job is never transitioned, validated, or assimilated.
    2) There's a normal job for which you want to use only
        hosts belonging to a specific user (e.g. cluster or cloud hosts).
        This restriction can be made either when the job is created,
        or on the fly,
        e.g. as part of a scheme for accelerating batch completion.
        For the latter purposes we now provide a function
            restrict_wu_to_user(DB_WORKUNIT&, int userid);

        The job goes through the standard
        transitioner/validator/assimilator path.

    These cases are enabled by config flags
        <enable_assignment_multi/>
        <enable_assignment/>
    respectively.

    Assignment of type 2) are no longer stored in shared mem,
    so there is no limit on their number.

    There is no longer a rule that assigned job names must contain "asgn".

    NOTE: this requires a database update.


svn path=/trunk/boinc/; revision=25169
2012-01-30 22:39:13 +00:00
Rom Walton be9e807e31 - sched: adjust the vbox??_mt plan classes to use 1.5 CPUs instead
of the full 2 CPUs. Vboxwrapper uses ceil() to allocate enough
        whole CPUs for Virtualbox.  Ideally this will cause the BOINC
        client-side scheduler to use the remaining fraction of the CPU
        for GPU data transfer which will then free up one whole CPU for
        another job.  All without over-commiting anything.
        
    sched/
        sched_customize.cpp

svn path=/trunk/boinc/; revision=25120
2012-01-21 18:51:37 +00:00
David Anderson 77086f1c6e - feeder: if we're rereading the DB because of trigger file,
do PERF_INFO::get_from_db() also.
    From Teemu Mannermaa.


svn path=/trunk/boinc/; revision=25114
2012-01-20 23:48:07 +00:00
David Anderson 8cd608605f - feeder fix
svn path=/branches/server_stable/; revision=25113
2012-01-20 23:43:37 +00:00
David Anderson 46fb7bd97a - file deleter: improved logging; from Oliver
svn path=/trunk/boinc/; revision=25050
2012-01-13 23:39:14 +00:00
David Anderson dd16170fc1 - scheduler: the p_fpops value reported by clients can't be trusted.
Some credit cheats (e.g. with credit_by_runtime) can be done
    by reporting a huge value.
    Fix this by capping the value at 1.1 times the 95th percentile
    of host.p_fpops, taken over active hosts.


svn path=/trunk/boinc/; revision=25017
2012-01-09 17:35:48 +00:00
David Anderson 436d56e70b - scheduler: change vbox_mt plan function to use at most 2 cores, not 3
(CERN doesn't actually need 3)


svn path=/trunk/boinc/; revision=25013
2012-01-09 02:48:51 +00:00
David Anderson e8657adfd2 - scheduler: change vbox_mt app plan function to use 1, 2 or 3 CPUs
depending on how many the host has,
    and whether CPU VM extensions are present
    (this reflects the requirements of CernVM).


svn path=/trunk/boinc/; revision=25009
2012-01-08 01:28:39 +00:00
David Anderson 5020e3af2f - validator: for credit_from_runtime,
use result.flops_estimate rather than host.p_fpops;
    otherwise it doesn't work for multicore apps.
    TODO: cheat-proofing


svn path=/trunk/boinc/; revision=25006
2012-01-06 22:22:02 +00:00
David Anderson 95ebb112c2 - client: for VBox apps, check stderr for "ERR_CPU_VM_EXTENSIONS_DISABLED".
If found, set HOST_INFO::p_vm_extensions_disabled,
    and pass this to the scheduler.
- scheduler (VBox app plan function) if a host has p_vm_extensions_disabled
    set, don't sent it multicore VBox jobs.

Note: if you have a host with VM extensions, and they're disabled
    in the BIOS, and you enable them, you can remove the
    <p_vm_extensions_disabled> line from client_state.xml
    and you'll be eligible to get multicore VM jobs again.


svn path=/trunk/boinc/; revision=24944
2011-12-30 09:43:58 +00:00
David Anderson 0f46b80985 - scheduler: record Vbox version correctly in host records
- remote job submission: partial checkin for new file sandbox stuff


svn path=/trunk/boinc/; revision=24937
2011-12-29 06:30:18 +00:00
David Anderson df5d595c3a - scheduler, vbox plan class function:
send only 32/64 bit version to 32/64 bit host


svn path=/trunk/boinc/; revision=24916
2011-12-27 02:29:51 +00:00
David Anderson 73eebc69fc - scheduler: we were using CPU time for elapsed time
when the latter wasn't reported.
    Do this BEFORE sanity checks on elapsed time
    to prevent cheating.


svn path=/trunk/boinc/; revision=24906
2011-12-26 14:31:14 +00:00
Rom Walton 94c7b82d3b svn path=/trunk/boinc/; revision=24903 2011-12-26 13:42:34 +00:00
David Anderson fe16024982 - scheduler: in vbox plan class, require that host have
VM acceleration hardware feature
- remove job submission: typo fix


svn path=/trunk/boinc/; revision=24902
2011-12-26 08:27:40 +00:00
David Anderson 4774eeda52 - make_project: don't try to copy nonexistent file; fixes #1166
- scheduler: change Vbox app plan function to accommodate
    single and multithreaded variants


svn path=/trunk/boinc/; revision=24884
2011-12-24 05:07:20 +00:00
David Anderson fe90776614 - scheduler: if an app has only GPU versions,
scale their PFC by 0.1 in credit calculations.
    This reflects the fact that GPU apps are typically less efficient
    (relative to device peak FLOPS) than are CPU apps.
    The actual values from SETI@home and Milkyway are 0.05 and 0.08.


svn path=/trunk/boinc/; revision=24842
2011-12-21 03:21:52 +00:00
David Anderson 0777ab174a - scheduler: if using homogeneous app version and a WU is committed
to a superceded or deprecated app version, use it anyway.
    The current app version may not validate against the old one.


svn path=/trunk/boinc/; revision=24823
2011-12-17 22:11:26 +00:00
David Anderson 5c02170d5a - storage simulator: add stats for network load and fault tolerance.
- client: msg tweak
- client: minimum work buffer lower bound is 180 sec
- scheduler: in computing HOST_USAGE::project_flops for a job,
    if we don't have sufficient elapsed_time statistics
    for either the (host, app_version) or the app_version,
    use a conservative estimate (p_fpops*(#cpus+#ngpus))
    rather than the number returned by app_plan().
    This avoids "time limit exceeded" errors when the latter is way off.


svn path=/trunk/boinc/; revision=24820
2011-12-16 19:45:31 +00:00
David Anderson 1366dc1cdc - scheduler: if using homogeneous app version,
and a WU is committed to an app version that's been superceded,
    treat it as committed to the later version.


svn path=/trunk/boinc/; revision=24776
2011-12-12 22:57:58 +00:00
David Anderson 4e95e690f8 - client: tweak parameters of file xfer backoff to
reduce backoff intervals somewhat
- vboxwrapper: fix buffer size typo (from Attila)
- scheduler: fix crash if using homogeneous app version,
    and a WU is committed to an old or deprecated app version.
    From Kevin Reed.


svn path=/trunk/boinc/; revision=24775
2011-12-12 22:07:37 +00:00
David Anderson 819360dfe8 - scheduler: encode CAL version numbers in a way that handles
release #s > 1000


svn path=/trunk/boinc/; revision=24746
2011-12-06 19:41:14 +00:00
David Anderson 2d7573b8f7 - scheduler: in app_plan(), check for "opencl" before "ati".
Otherwise "opencl_ati" won't get handled right.
    Fixes #1158


svn path=/trunk/boinc/; revision=24716
2011-12-02 14:37:47 +00:00
David Anderson dd93780787 - API and client: add "ncpus" field to APP_INIT_DATA.
Tells multicore apps how many cores to use.
    The --nthreads command line arg to the app is now deprecated
    though we'll keep it around for the time being.


svn path=/trunk/boinc/; revision=24708
2011-12-01 18:44:19 +00:00
David Anderson 4111c5696c - scheduler: fix crashing bug (don't memset SCHED_REQUEST).
svn path=/trunk/boinc/; revision=24660
2011-11-29 04:47:10 +00:00
David Anderson 8877aa5183 - web: in GPU model list page,
look for plan classes containing "nvidia" as well as "cuda".


svn path=/trunk/boinc/; revision=24614
2011-11-16 19:47:40 +00:00
David Anderson 6e1414a07f - scheduler: increase buffer for global prefs from 8K to 64K
- lay the groundwork for changing it to std::string


svn path=/trunk/boinc/; revision=24595
2011-11-15 00:11:12 +00:00
David Anderson 2ac9fe8566 - client/scheduler:
If the file "client_opaque.txt" exists on the client,
    include its contents in scheduler request messages.
    On the scheduler, parse this into SCHEDULER_REQUEST::client_opaque,
    where it can be used by the customizable scheduler functions.


svn path=/trunk/boinc/; revision=24586
2011-11-14 06:27:36 +00:00
David Anderson 56e5a2c089 - scheduler: compute result.report_deadline BEFORE passing it
as an arg to update_wu_on_send()


svn path=/trunk/boinc/; revision=24567
2011-11-09 23:50:09 +00:00
Jeff Cobb 2e526a0956 server: more fixes to DB to handle unsigned result IDs
svn path=/trunk/boinc/; revision=24564
2011-11-09 20:24:48 +00:00
David Anderson 7c201eba3f - DB: use %u when writing result IDs in SQL queries;
this is to support SETI@home, which ran out of result IDs
    and changed the DB field type to int unsigned.
    Note: eventually I'll make this change official
    and change the .h types as well.
- web: put <apps_selected> tags around <app_id> elements
    in project-specific prefs.


svn path=/trunk/boinc/; revision=24555
2011-11-09 07:41:49 +00:00
David Anderson 6d906cf523 - scheduler: fix bug in HR code (from Kevin)
svn path=/trunk/boinc/; revision=24534
2011-11-06 04:53:03 +00:00
David Anderson d53b89fe6f - feeder: fix logic error in the way app_version.pfc_scale is updated
(from Kevin Reed)


svn path=/trunk/boinc/; revision=24514
2011-11-03 07:08:52 +00:00
David Anderson 497c08b5c2 - scheduler: in update_wu_on_send(), always try to update
the hr_class and app_version_id fields,
    with the where clause that they be either zero or the target value.
    This handles the cases where
    1) because of the failure of a results, the transitioner set
        the field back to zero;
    2) another scheduler set the field to the target value


svn path=/trunk/boinc/; revision=24513
2011-11-03 06:46:05 +00:00
David Anderson 690e18bbe3 - server: plan class names containing 'nvidia' or 'cuda'
are assumed to be for NVIDIA GPU apps;
    plan class names containing 'ati' are assumed to be for AMD GPU apps.
    Clauses for 'nvidia' were missing in a couple of places.


svn path=/trunk/boinc/; revision=24512
2011-11-03 05:26:19 +00:00
David Anderson 743d687c05 - scheduler: bug fix from Kevin
svn path=/trunk/boinc/; revision=24503
2011-10-27 03:55:18 +00:00
David Anderson d2e5ed17cf - client: smoothed working-set size wasn't being computed correctly.
It was always just the most recent size.


svn path=/trunk/boinc/; revision=24500
2011-10-26 23:23:01 +00:00
David Anderson 2371bd641a - scheduler: typo in a SQL query
svn path=/trunk/boinc/; revision=24498
2011-10-26 22:05:50 +00:00
David Anderson 5d76e13277 - scheduler: tweaks to last night's checkin.
In the inner loop of scan_work_array() there are two WORKUNITs:
    - the one that's part of wu_result (in the shared-mem array)
    - a temp copy.
        quick_check() may modify this in host-specific ways
        (e.g., adjusting rsc_fpops_est or delay_bound).
        This is the one we pass to add_result_to_reply().
    When we reread hr_class and app_version_id from the DB,
    update both structs.


svn path=/trunk/boinc/; revision=24493
2011-10-26 16:51:10 +00:00
David Anderson 4b826b52a0 - scheduler: fix bug in the "homogeneous app version" (HAV) feature
(reported by Kevin Reed).
    The problem: cache inconsistency.
    If there are 2 results for the same WU in shared mem,
    and 2 scheduler instances get them around the same time,
    they can send them with different app versions.
    We already fixed this problem for HR by
    1) rereading the relevant WU fields while deciding
        whether to send the result
    2) doing a "careful update" of the WU field using a where clause
        to make sure it wasn't modified in the (short) interval
        since rereading it.
    I fixed the HAV problem in the same way,
    and merged the two mechanisms to combine the DB queries.

    Also:
    - The rereads are done in slow_check() (see below).
    - The careful updates are done in update_wu_on_send(),
        and this is called *before* doing careful updates on result fields.
        That way, if the WU updates fail, we don't have orphaned results.
    - already_sent_to_different_platform_careful() (sic)
        no longer does DB stuff, so it's merged with
        already_send_to_different_hr_class() (better name)

    NOTE: slow_check() is used in array scheduling only.
        Score-based scheduling uses other code,
        in which this bug is not yet fixed.
        Locality scheduling doesn't support HR or HAV at all.
        This should be unified.


svn path=/trunk/boinc/; revision=24484
2011-10-26 07:15:22 +00:00
David Anderson 836e8aacf7 - scheduler: in cuda_check(), ati_check() and opencl_check()
(in sched_customize.cpp)
    the flops_scale argument is intended to express the
    GPU efficiency (actual/peak).
    Pass appropriate values.


svn path=/trunk/boinc/; revision=24405
2011-10-16 06:04:13 +00:00
David Anderson 921b5c50df - client: create and destroy PERS_FILE_XFERs even if network suspended.
This will show pending uploads in the Transfers tab.
- file_upload_handler: fix message to client when can't acquire lock
- client: parse <alt_platform> in state file correctly


svn path=/trunk/boinc/; revision=24391
2011-10-13 19:05:18 +00:00
David Anderson b1456cfc0b - scheduler: fix a bug that would choose app versions erroneously.
The problem: the choice of app version was based on
    the "projected FLOPS" return by estimate_flops(av).
    If usage stats exist for the host / app version,
    this returns a number X such that
    WU.rsc_fpops_est/X approximates the runtime of a job
    using the given app version..
    (If WU.rsc_fpops_est is way off, this will be correspondingly way off
    from the actual FLOPS the app version will get.)
    However, if there are no usage stats,
    it return an estimate based on host hardware speed,
    which might be 100X less.
    Hence, in some cases a new app version would never get used.

    Solution: choose app versions based on the values
    returned by the app plan functions.
    Use estimate_flops() AFTER choosing the version.
- scheduler: improve the accuracy of FLOPS estimation for GPU apps.
    The "flops_scale" argument to coproc_perf
    (which expresses the difference between peak GPU FLOPS
    and actual FLOPS) should be used to scale GPU FLOPS
    prior to calling coproc_perf(),
    rather than scaling the estimate returned by coproc_perf().
- show_shmem: show have_X_apps flags


svn path=/trunk/boinc/; revision=24385
2011-10-12 23:59:38 +00:00
David Anderson cb3cdae1a5 - client/server: add a new result state RESULT_UPLOAD_FAILED
for when the job completed successfully but
    one or more output files had permanent upload failures.
    Show this state in web interfaces.
- sample_work_generator: check return value of count_unsent_results(),
    so that we don't generate infinite work if there's a DB problem
- web: RSS feed shows news items from last 90 days, rather than 14


svn path=/trunk/boinc/; revision=24377
2011-10-11 17:41:10 +00:00
David Anderson e00b080b5e - scheduler: fix crashing bug when using HR. From Kevin Reed.
svn path=/trunk/boinc/; revision=24355
2011-10-08 08:16:24 +00:00
David Anderson dd3b628748 - client: compare OpenCL-only devices the same as other devices
- code cleanup


svn path=/trunk/boinc/; revision=24354
2011-10-08 06:33:39 +00:00
David Anderson 279c3a2b37 - scheduler: problem: in the daily quota mechanism,
the boundary between days is 00:00 in server local time.
    This creates a spike of jobs being dispatched
    (and files being downloaded) after that time.

    Solution: distribute the boundary uniformly,
    using a random number determined by the host ID.
    (Make sure to save/restore the seed around this,
    so we don't destroy the randomness of other things)


svn path=/trunk/boinc/; revision=24353
2011-10-08 05:17:44 +00:00
David Anderson 6da80764fc - scheduler add app_plan() support for plan classes
opencl_nvidia_101 and opencl_ati_101


svn path=/trunk/boinc/; revision=24345
2011-10-07 19:23:37 +00:00