boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	5035007b90	- back end: new way of deciding: - whether host is "reliable" for an app version - whether host is eligible for single replication for an app version - whether to use host scaling In each case, the answer is yes if the number of consecutive valid results is above a threshold. This replaces existing "error rate" and "scale probation" mechanisms. TODO: the # of consecutive valid results should also determine a limit on jobs in progress for an app version. Namely, if N is the threshold for host scaling, the limit should be ndevices*(max(1, consecutive_valid - N)) The client currently doesn't supply enough app version info to do this. It could be approximated; that would give some protection against cherry-picking. - credit: more conservative formulas for combining claimed credit among replicas. If there are normal replicas, we use a "low average" that weights each sample by the sum of the other samples. Otherwise we use the min (not the average) of the approximate samples. NOTE: a DB update is required svn path=/trunk/boinc/; revision=21230	2010-04-21 19:33:20 +00:00
David Anderson	6893691ae2	- validator: message tweak svn path=/trunk/boinc/; revision=21212	2010-04-19 22:57:49 +00:00
David Anderson	61195cb59d	- validator: fix bug where host.total_credit not incremented svn path=/trunk/boinc/; revision=21211	2010-04-19 21:46:45 +00:00
David Anderson	b71d3e6cf4	- back end: typo and tweaks svn path=/trunk/boinc/; revision=21196	2010-04-16 21:16:18 +00:00
David Anderson	021edb02c2	- back end programs: improve log msgs svn path=/trunk/boinc/; revision=21193	2010-04-16 18:07:08 +00:00
David Anderson	02717af2f3	- bug fixes svn path=/trunk/boinc/; revision=21187	2010-04-15 21:58:44 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	e05a479f42	- scheduler and validator: distinguish between 1) peak FLOPS (based on benchmarks or GPU attributes). This does not change over time. It's not adjusted on the basis of statistics. It's not affected by wu.rsc_fpops_est. It can be compared across projects. versus 2) projected FLOPS: the scheduler's best guess as to what will satisfy X * elapsed_time = wu.rsc_fpops_est; this is used to make server-side runtime estimates, and it's sent to the client and used for its runtime estimates. It may be based on the (host, app version) elapsed time average. My checkin [21153] mistakently confounded these two. Notes: 1) app_plan() now must return both peak and projected FLOPS. 2) result.flops_estimate stores peak FLOPS 3) the <flops> field in app_info.xml files should be projected FLOPS. But its accuracy is not important; it's not used once the server has statistics for the (host, app version) svn path=/trunk/boinc/; revision=21164	2010-04-10 05:49:51 +00:00
David Anderson	1d765245ed	- scheduler: sweeping changes to the way job runtimes are estimated: see http://boinc.berkeley.edu/trac/wiki/RuntimeEstimation svn path=/trunk/boinc/; revision=21153	2010-04-08 23:14:47 +00:00
David Anderson	212fb765e9	- validator: detect jobs that used GPU app but fell back to CPU (SETI@home does this if GPU initialization fails). Treat these like CPU apps for credit purposes. svn path=/trunk/boinc/; revision=21130	2010-04-06 23:48:35 +00:00
David Anderson	e276aa5ed6	- server: make the -d 4 feature work with FCGI svn path=/trunk/boinc/; revision=21109	2010-04-05 23:12:02 +00:00
David Anderson	2536797068	- validator: remove update_credit_per_cpu_sec(). Irrelevant. TODO: remove related code - validator: update wu.canonical_credit correctly. However, this field should be deprecated. - validator: check for error return from assign_credit_set(). svn path=/trunk/boinc/; revision=21096	2010-04-05 20:03:54 +00:00
David Anderson	a2a661993b	- validator: -d 4 means -d 3 plus print all DB queries (todo: do this for all daemons) - validator: change cmdline args from -foo to --foo (todo: do this for all daemons) - validator: pass max_granted_credit to assign_credit_set() svn path=/trunk/boinc/; revision=21093	2010-04-05 18:59:16 +00:00
David Anderson	54dce55e98	- backend: fix scaling problem that was producing xe15 size credits. This had messed up the beta DB, which I had to clean up. Added a cap (1e5) to prevent this in the future. svn path=/trunk/boinc/; revision=21064	2010-04-02 23:18:47 +00:00
David Anderson	78d11a263b	- backend: improved messages for app version credit updates svn path=/trunk/boinc/; revision=21063	2010-04-02 21:45:43 +00:00
David Anderson	19f7d66b53	- backend programs: change the way PFC and elapsed-time statistics are written to the DB. The incremental approach was bogus. New approach: host_app_version: write directly; R/W interval is tiny app_version: maintain an explicit list of update samples for both PFC and credit. When the validator flushes its app_version cache, do careful updates. Note: when using double fields in careful updates, you can't test for equality. Use abs(new-old) < 1e-N svn path=/trunk/boinc/; revision=21057	2010-04-02 19:10:37 +00:00
David Anderson	38bd1c8def	- validator: improved log messages - fix some compiler warnings svn path=/trunk/boinc/; revision=21053	2010-04-01 22:51:19 +00:00
David Anderson	fb851311e0	- server: various changes; see http://boinc.berkeley.edu/trac/wiki/CreditNew Projects will need to update DB and recompile all back-end programs. Summary: - new way of computing credit - "reliable host" mechanism is per app version - "host punishment" mechanism is per app version - adjustment of wu.rsc_fpops_est provides the equivalent of per app version DCF - max jobs in progress is now per app - max jobs per RPC is now per app TODO: - reliable mechanism: - populate and use host_app_version.error_rate - populate host_app_version.turnaround - host punishment: - populate host_app_version.max_jobs_per_day - populate host_app_version.n_jobs_today - use app.max_jobs_per_day_init - job limits: - use app.max_jobs_in_progress, max_gpu_jobs_in_progress - use app.max_jobs_per_rpc - adjust wu.rsc_fpops_est - remove old credit stuff fpops_cumulative, credit_multiplier credit computation in scheduler - AVERAGE class: use the Knuth algorithm (Wikipedia) svn path=/trunk/boinc/; revision=21021	2010-03-29 22:28:20 +00:00
David Anderson	3fb7c8f13f	- server code: moved everything related to credit-granting to credit.cpp, where it can be used by trickle handlers as well as by validators. svn path=/trunk/boinc/; revision=18831	2009-08-12 16:26:43 +00:00
David Anderson	3eeefc0048	- server code cleanup svn path=/trunk/boinc/; revision=18830	2009-08-12 16:01:46 +00:00
David Anderson	f6d3e8a477	svn path=/trunk/boinc/; revision=18829	2009-08-11 15:17:37 +00:00

21 Commits