boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	7daae1d0c7	- client: when emerge from bandwidth quota network suspension, add 0..1hr random delay to existing transfers, to avoid DDOS effect svn path=/trunk/boinc/; revision=21415	2010-05-07 20:08:59 +00:00
David Anderson	ef0019d8c3	- validator: bug fixes: bad formula for low_average(); failure to reread app_versions because of 1e6/1e-6 typo svn path=/trunk/boinc/; revision=21302	2010-04-26 23:12:40 +00:00
David Anderson	5035007b90	- back end: new way of deciding: - whether host is "reliable" for an app version - whether host is eligible for single replication for an app version - whether to use host scaling In each case, the answer is yes if the number of consecutive valid results is above a threshold. This replaces existing "error rate" and "scale probation" mechanisms. TODO: the # of consecutive valid results should also determine a limit on jobs in progress for an app version. Namely, if N is the threshold for host scaling, the limit should be ndevices*(max(1, consecutive_valid - N)) The client currently doesn't supply enough app version info to do this. It could be approximated; that would give some protection against cherry-picking. - credit: more conservative formulas for combining claimed credit among replicas. If there are normal replicas, we use a "low average" that weights each sample by the sum of the other samples. Otherwise we use the min (not the average) of the approximate samples. NOTE: a DB update is required svn path=/trunk/boinc/; revision=21230	2010-04-21 19:33:20 +00:00
David Anderson	61195cb59d	- validator: fix bug where host.total_credit not incremented svn path=/trunk/boinc/; revision=21211	2010-04-19 21:46:45 +00:00
David Anderson	02717af2f3	- bug fixes svn path=/trunk/boinc/; revision=21187	2010-04-15 21:58:44 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	fb851311e0	- server: various changes; see http://boinc.berkeley.edu/trac/wiki/CreditNew Projects will need to update DB and recompile all back-end programs. Summary: - new way of computing credit - "reliable host" mechanism is per app version - "host punishment" mechanism is per app version - adjustment of wu.rsc_fpops_est provides the equivalent of per app version DCF - max jobs in progress is now per app - max jobs per RPC is now per app TODO: - reliable mechanism: - populate and use host_app_version.error_rate - populate host_app_version.turnaround - host punishment: - populate host_app_version.max_jobs_per_day - populate host_app_version.n_jobs_today - use app.max_jobs_per_day_init - job limits: - use app.max_jobs_in_progress, max_gpu_jobs_in_progress - use app.max_jobs_per_rpc - adjust wu.rsc_fpops_est - remove old credit stuff fpops_cumulative, credit_multiplier credit computation in scheduler - AVERAGE class: use the Knuth algorithm (Wikipedia) svn path=/trunk/boinc/; revision=21021	2010-03-29 22:28:20 +00:00
David Anderson	295d4b54ea	- server: major improvements to locality scheduling from Einstein@home. Triggering the work generator is now done via the DB instead of flat files. Since only E@h uses locality scheduling, I kept the DB changes in a separate file (db/schema_locality.sql). There's a new field in the workunit table, and that's a required update (in db_update.php) - manager: compile fix svn path=/trunk/boinc/; revision=20807	2010-03-05 22:55:16 +00:00
David Anderson	53aa10570a	- feeder: fix crashing bug svn path=/branches/server_stable/; revision=19508	2009-11-06 23:31:26 +00:00
David Anderson	da7e82fe15	- scheduler and back end: add new fields to result table: elapsed_time: the elapsed time (runtime) as reported by client flops_estimate: the app's estimated FLOPS as reported by app_plan() app_version_id: the DB ID of the app_version used (or -1 if anonymous platform) TODO: show these in the web interfaces, and use them where appropriate svn path=/trunk/boinc/; revision=19002	2009-09-03 20:26:31 +00:00
David Anderson	8b701fc73f	- scheduler: fix messed-up deadline check logic. Old: 1) check deadline based on wu.delay_bound 2) in add_result_to_reply(), potentially modify wu.delay_bound, e.g. because of retry acceleration problem: reducing delay bound may cause deadline miss New: 1) new function get_delay_bound_range() (called from wu_is_infeasible_fast()) returns optimistic and pessimistic delay bounds. Retry acceleration logic is here. 2) check deadline based on optimistic bound; if that fails, check based on pessimistic bound. Set wu.delay_bound to the one that worked. Notes: - get_delay_bound_range() needs result priority and report deadline, and it's called before we read the full result. So add these items to WORK_ITEM and WU_RESULT. - get_delay_bound_range() could be customized for project-specific deadline policy. - add_result_to_reply() was becoming a toxic waste dump. Deadline-related stuff should have been factored out in any case. svn path=/trunk/boinc/; revision=18946	2009-08-31 19:35:46 +00:00
Jeff Cobb	15ccf7b778	Added table state_counts. svn path=/trunk/boinc/; revision=18490	2009-06-23 21:45:22 +00:00
David Anderson	10f9e11ee6	- lib: created a new file for declaring "replacements" for functions like strlcpy() etc. config.h is included here rather than in str_util.h svn path=/trunk/boinc/; revision=18437	2009-06-16 20:54:44 +00:00
Jeff Cobb	8b18ab4a16	Added result userid and teamid to the column set in DB_VALIDATOR_ITEM_SET. This is needed by application validator code for cuda logging. svn path=/trunk/boinc/; revision=16673	2008-12-12 17:54:01 +00:00
David Anderson	98cfb8d3b0	- rename .C files to .cpp so that Doxygen will work svn path=/trunk/boinc/; revision=16069	2008-09-26 18:20:24 +00:00

15 Commits