Commit Graph

1768 Commits

Author SHA1 Message Date
David Anderson 2b33429f18 - scheduler: fix bug in single-replication decision (from Rytis)
svn path=/trunk/boinc/; revision=21576
2010-05-18 22:32:05 +00:00
David Anderson 40eebe00af - client/scheduler: in COPROCS, instead of having a vector of
pointers to dynamically allocated COPROC-derived objects,
    just have the objects themselves.
    Dynamic allocation should be avoided at all costs.

svn path=/trunk/boinc/; revision=21564
2010-05-18 19:22:34 +00:00
Bernd Machenschalk 285a41c7a4 fix query in send_old_work()
from Oliver Bock

svn path=/trunk/boinc/; revision=21561
2010-05-18 15:45:37 +00:00
David Anderson 68e5199fe7 - client: just send 1 copy of app versions
svn path=/trunk/boinc/; revision=21516
2010-05-14 03:08:23 +00:00
David Anderson 9187cb52ba - client and scheduler RPC:
Add more info to "project in-progress job list".
    Old: entries included only job name and app plan class;
        this was used to resend lost jobs,
        and to count the # of CPU and GPU jobs.
        But it's not usable e.g. for per-app in-progress limits.
    New: send the client's app versions (including usage info)
        and for each in-progress job, which app version it uses.
        (This reduces request-message size compared with sending
        usage info and app name per job).
- client and scheduler RPC:
    Add more info to "all in-progress job list", and make it optional.
    This list is used by schedulers that do deadline checks
    using EDF workload simulation.
    Old: the list is always sent, and it contains no info
        about job resource usage
    New: the list is sent only if the scheduler asked for it
        in a previous reply,
        and each entry now contains resource usage (CPU, GPUs)
    Note: the scheduler's EDF simulator is outdated;
        it doesn't know about GPU jobs.
        But we may as well get the info in place.


svn path=/trunk/boinc/; revision=21513
2010-05-13 20:18:27 +00:00
David Anderson 5470d7289a - scheduler: fix bug in daily job quota check
svn path=/trunk/boinc/; revision=21506
2010-05-13 16:45:27 +00:00
David Anderson 256c694c96 - client: make GPU available RAM measurement #ifdef-selectable,
and default it to off
- client: if we print available GPU RAM (which we now don't)
    have a separate timer per GPU type
- scheduler: add new plan classes cuda_opencl (sic) and ati_opencl

svn path=/trunk/boinc/; revision=21498
2010-05-13 03:07:33 +00:00
David Anderson 7688a6c5d6 - scheduler: fix for daily quota enforcement
svn path=/trunk/boinc/; revision=21495
2010-05-12 21:24:52 +00:00
David Anderson 63dcfabe0e - scheduler: changeset 21148 broke the scheduler.
We store pointers to BEST_APP_VERSION in both APP_VERSION and RESULT.
    We can't then fiddle with the vector that these point into.
    Switch back to using a vector of pointers.
    This restores the memory leak, which I'll deal with later.

svn path=/trunk/boinc/; revision=21494
2010-05-12 21:07:39 +00:00
David Anderson c7a67e57bb svn path=/trunk/boinc/; revision=21491 2010-05-12 20:03:25 +00:00
David Anderson b8df52bc8a - client: temporarily enable logic that deallocates memory on exit,
so that we can look for memory leaks.
- client: enable bandwidth quota limit only if both
    #MB and #days are nonzero.
- scheduler: when resending work, don't send more than
    client is requesting
- scheduler: restore Cobblestone factor to 100

svn path=/trunk/boinc/; revision=21460
2010-05-11 19:50:14 +00:00
David Anderson 6fbfee024b - client: day boundary for "transfer at most X in N days"
is midnight local time, not UTC
- update translation templates

svn path=/trunk/boinc/; revision=21362
2010-05-03 17:20:44 +00:00
David Anderson ef0019d8c3 - validator: bug fixes: bad formula for low_average();
failure to reread app_versions because of 1e6/1e-6 typo


svn path=/trunk/boinc/; revision=21302
2010-04-26 23:12:40 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 6893691ae2 - validator: message tweak
svn path=/trunk/boinc/; revision=21212
2010-04-19 22:57:49 +00:00
David Anderson 61195cb59d - validator: fix bug where host.total_credit not incremented
svn path=/trunk/boinc/; revision=21211
2010-04-19 21:46:45 +00:00
David Anderson 01402bb45a - client: improve GPU scheduling
old: assign GPUs, then check available RAM
        Problem: may cause starvation on multi-GPU systems.
    new: use available RAM info in the assignment process.
        Prevents starvation, also reduces the number of driver calls.

svn path=/trunk/boinc/; revision=21205
2010-04-18 03:00:33 +00:00
David Anderson b71d3e6cf4 - back end: typo and tweaks
svn path=/trunk/boinc/; revision=21196
2010-04-16 21:16:18 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson 02717af2f3 - bug fixes
svn path=/trunk/boinc/; revision=21187
2010-04-15 21:58:44 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
Bernd Machenschalk 061601fe28 scheduler: as db-driven client file management isn't ready yet,
adapt Einstein@home-specific file management hack to current run

svn path=/trunk/boinc/; revision=21172
2010-04-13 14:24:40 +00:00
David Anderson 2e41153d8b - scheduler: fix egregious bug which limited sending to 1 job per RPC
- scheduler: fix bug that broke anon platform

Note: Bruce Allen once advised me to take a few days and just
observe BOINC in action.
I should really do this more often; it always turns up bugs
and/or design flaws.


svn path=/trunk/boinc/; revision=21165
2010-04-11 04:42:52 +00:00
David Anderson e05a479f42 - scheduler and validator: distinguish between
1) peak FLOPS (based on benchmarks or GPU attributes).
        This does not change over time.
        It's not adjusted on the basis of statistics.
        It's not affected by wu.rsc_fpops_est.
        It can be compared across projects.
    versus
    2) projected FLOPS: the scheduler's best guess as to what will satisfy
        X * elapsed_time = wu.rsc_fpops_est;
        this is used to make server-side runtime estimates,
        and it's sent to the client and used for its runtime estimates.
        It may be based on the (host, app version) elapsed time average.
    My checkin [21153] mistakently confounded these two.

    Notes:
    1) app_plan() now must return both peak and projected FLOPS.
    2) result.flops_estimate stores peak FLOPS
    3) the <flops> field in app_info.xml files should be
        projected FLOPS.  But its accuracy is not important;
        it's not used once the server has statistics
        for the (host, app version)

svn path=/trunk/boinc/; revision=21164
2010-04-10 05:49:51 +00:00
David Anderson 132a35c38a typo
svn path=/trunk/boinc/; revision=21154
2010-04-09 03:45:25 +00:00
David Anderson 1d765245ed - scheduler: sweeping changes to the way job runtimes are estimated:
see http://boinc.berkeley.edu/trac/wiki/RuntimeEstimation


svn path=/trunk/boinc/; revision=21153
2010-04-08 23:14:47 +00:00
David Anderson 85e06afe4b - scheduler: app_plan() no longer has to guess how efficiently
an app version will run on a particular host.
- scheduler: fix memory leak: BEST_APP_VERSIONs weren't being freed


svn path=/trunk/boinc/; revision=21148
2010-04-08 18:27:27 +00:00
David Anderson 212fb765e9 - validator: detect jobs that used GPU app but fell back to CPU
(SETI@home does this if GPU initialization fails).
    Treat these like CPU apps for credit purposes.

svn path=/trunk/boinc/; revision=21130
2010-04-06 23:48:35 +00:00
David Anderson 4462fe534b - client: don't do RSS fetch if network suspended
svn path=/trunk/boinc/; revision=21123
2010-04-06 20:32:02 +00:00
David Anderson e276aa5ed6 - server: make the -d 4 feature work with FCGI
svn path=/trunk/boinc/; revision=21109
2010-04-05 23:12:02 +00:00
David Anderson 515113d7dd - server: change all backend programs so that -d 4 means
-d 3 plus print DB queries

svn path=/trunk/boinc/; revision=21106
2010-04-05 21:59:33 +00:00
David Anderson 2536797068 - validator: remove update_credit_per_cpu_sec(). Irrelevant.
TODO: remove related code
- validator: update wu.canonical_credit correctly.
    However, this field should be deprecated.
- validator: check for error return from assign_credit_set().

svn path=/trunk/boinc/; revision=21096
2010-04-05 20:03:54 +00:00
David Anderson a2a661993b - validator: -d 4 means -d 3 plus print all DB queries
(todo: do this for all daemons)
- validator: change cmdline args from -foo to --foo
    (todo: do this for all daemons)
- validator: pass max_granted_credit to assign_credit_set()

svn path=/trunk/boinc/; revision=21093
2010-04-05 18:59:16 +00:00
David Anderson 54dce55e98 - backend: fix scaling problem that was producing xe15 size credits.
This had messed up the beta DB, which I had to clean up.
    Added a cap (1e5) to prevent this in the future.

svn path=/trunk/boinc/; revision=21064
2010-04-02 23:18:47 +00:00
David Anderson 78d11a263b - backend: improved messages for app version credit updates
svn path=/trunk/boinc/; revision=21063
2010-04-02 21:45:43 +00:00
David Anderson 19f7d66b53 - backend programs: change the way PFC and elapsed-time statistics
are written to the DB.
    The incremental approach was bogus.
    New approach:
    host_app_version: write directly; R/W interval is tiny
    app_version: maintain an explicit list of update samples
        for both PFC and credit.
        When the validator flushes its app_version cache,
        do careful updates.
    Note: when using double fields in careful updates,
    you can't test for equality.  Use abs(new-old) < 1e-N

svn path=/trunk/boinc/; revision=21057
2010-04-02 19:10:37 +00:00
David Anderson 38bd1c8def - validator: improved log messages
- fix some compiler warnings


svn path=/trunk/boinc/; revision=21053
2010-04-01 22:51:19 +00:00
David Anderson ad3ed99b96 - scheduler: choose cuda_fermi over other cuda plan classes
svn path=/trunk/boinc/; revision=21052
2010-04-01 21:18:16 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
Bernd Machenschalk ed8b0e5499 db_purge:
- fix -one_pass
- added -dont_delete (don't delete from DB, for testing)
- added -daily_dir (write archives in a new directory each day)

svn path=/trunk/boinc/; revision=20993
2010-03-24 10:57:16 +00:00
David Anderson d40be2dbf7 - feeder: compile fix
svn path=/trunk/boinc/; revision=20987
2010-03-23 17:26:05 +00:00
David Anderson f198dc76ad - feeder: with -allapps option, allow some apps to have zero weights;
no jobs will be sent for them.


svn path=/trunk/boinc/; revision=20980
2010-03-22 20:12:24 +00:00
David Anderson 3452bbbc87 - GUI RPC: more replacement of std::string
svn path=/trunk/boinc/; revision=20889
2010-03-13 04:40:37 +00:00
David Anderson 43f0c70458 - credit test program:
It's working pretty well; for S@h, new credit is 56% of old credit,
    whether or not we include small-credit jobs.
- improve AVERAGE class (from John McLeod)

svn path=/trunk/boinc/; revision=20845
2010-03-11 17:49:19 +00:00
David Anderson 08af46829a - credit test program: create a data file separately so you
don't have to do a big DB query each time


svn path=/trunk/boinc/; revision=20837
2010-03-11 04:19:07 +00:00
David Anderson 4f77556c74 - client: if a GPU job is blocked on available mem,
don't fetch more jobs for that resource type

svn path=/trunk/boinc/; revision=20817
2010-03-10 06:00:37 +00:00
David Anderson 0ad0886df3 - server credit stuff.
New policy: anon platform and old platform jobs
    get average credit, possibly scaled by elapsed time.
    We no longer attempt to guess what app version produced them.

svn path=/trunk/boinc/; revision=20816
2010-03-10 00:33:31 +00:00
David Anderson 8062f21d59 - server credit stuff (partial checkin)
svn path=/trunk/boinc/; revision=20810
2010-03-09 04:15:10 +00:00
David Anderson 295d4b54ea - server: major improvements to locality scheduling from Einstein@home.
Triggering the work generator is now done via the DB
    instead of flat files.

    Since only E@h uses locality scheduling,
    I kept the DB changes in a separate file (db/schema_locality.sql).
    There's a new field in the workunit table,
    and that's a required update (in db_update.php)
- manager: compile fix


svn path=/trunk/boinc/; revision=20807
2010-03-05 22:55:16 +00:00
David Anderson 5b7f8b8348 - web: fix bug that caused "send email" and "show hosts"
in project prefs to always select "no"


svn path=/trunk/boinc/; revision=20786
2010-03-04 04:16:00 +00:00