boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	e80e54fd4d	- user web: add "Application info" link in host page, linking to new page showing host_app_versions for this host - scheduler: message tweaks svn path=/trunk/boinc/; revision=21690	2010-06-03 20:26:02 +00:00
David Anderson	89fab4ece5	- back end: change "daily result quota" mechanism. Old: config.xml specifies an initial daily quota (say, 100). Each host_app_version starts out with this quota. On the return of a SUCCESS result, the quota is doubled, up to the initial value. On the return of an error result, or a timeout, the quota is decremented down to 1. Problem: Doesn't accommodate hosts that can do more than 100 jobs/day. New: similar, but - on validation of a job, daily quota is incremented. - on invalidation of a job, daily quota is decremented. - on return of an error result, or a timeout, daily quota is min'd with initial quota, then decremented. Notes: - This allows a host to have an unboundedly large quota as long as it continues to return more valid than invalid results. - Even with this change, hosts that return SUCCESS but invalid results will continue to get the initial daily quota. It would be desirable to reduce their quota to 1. svn path=/trunk/boinc/; revision=21675	2010-06-02 00:11:01 +00:00
David Anderson	64def3d588	- scheduler: fix bug that caused resent jobs with anonymous platform to have zero FPOPS est and bound svn path=/trunk/boinc/; revision=21671	2010-06-01 19:56:54 +00:00
David Anderson	5035007b90	- back end: new way of deciding: - whether host is "reliable" for an app version - whether host is eligible for single replication for an app version - whether to use host scaling In each case, the answer is yes if the number of consecutive valid results is above a threshold. This replaces existing "error rate" and "scale probation" mechanisms. TODO: the # of consecutive valid results should also determine a limit on jobs in progress for an app version. Namely, if N is the threshold for host scaling, the limit should be ndevices*(max(1, consecutive_valid - N)) The client currently doesn't supply enough app version info to do this. It could be approximated; that would give some protection against cherry-picking. - credit: more conservative formulas for combining claimed credit among replicas. If there are normal replicas, we use a "low average" that weights each sample by the sum of the other samples. Otherwise we use the min (not the average) of the approximate samples. NOTE: a DB update is required svn path=/trunk/boinc/; revision=21230	2010-04-21 19:33:20 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	da7e82fe15	- scheduler and back end: add new fields to result table: elapsed_time: the elapsed time (runtime) as reported by client flops_estimate: the app's estimated FLOPS as reported by app_plan() app_version_id: the DB ID of the app_version used (or -1 if anonymous platform) TODO: show these in the web interfaces, and use them where appropriate svn path=/trunk/boinc/; revision=19002	2009-09-03 20:26:31 +00:00
David Anderson	9e9f2a9878	- scheduler: code cleanup svn path=/trunk/boinc/; revision=18896	2009-08-21 19:14:15 +00:00
David Anderson	b300519444	svn path=/trunk/boinc/; revision=18825	2009-08-10 04:49:02 +00:00
Rytis Slatkevičius	f239587bdb	Sched: config option not to store stderr_out if exit_status==0 (to save on DB size). With help from Nicolas Alvarez. svn path=/trunk/boinc/; revision=18528	2009-06-30 18:00:58 +00:00
David Anderson	10f9e11ee6	- lib: created a new file for declaring "replacements" for functions like strlcpy() etc. config.h is included here rather than in str_util.h svn path=/trunk/boinc/; revision=18437	2009-06-16 20:54:44 +00:00
David Anderson	8a765c0f4a	- file deleter: detect cases where the upload/download dir doesn't exist, and treat it as a recoverable error (i.e., retry). The file deleter may run on a host that NSF-mounts the upload/download dirs, and NSF mounts can file. - scheduler: include WU#ID in log msgs for handled results svn path=/trunk/boinc/; revision=18351	2009-06-10 17:42:18 +00:00
David Anderson	c2fda4db09	- scheduler: add <report_max> config parameter; limits the # of completed results handled per scheduler RPC. This may be needed to avoid crashes due to memory allocation failure (each reported result uses about 128KB memory). - web: In showing result lists, include "Validate error" results in the "Invalid" category. (Previously they didn't appear in any category) svn path=/trunk/boinc/; revision=18104	2009-05-14 19:01:40 +00:00
David Anderson	dcc3bbe36f	- scheduler: slight code cleanup svn path=/trunk/boinc/; revision=17395	2009-02-26 03:03:35 +00:00
David Anderson	85a8e6a772	- scheduler: remove the config flag <have_cuda_apps>, and add <cuda_multiplier>. The latter is used in calculating max jobs/day for a host; namely, it's host.max_results_day * (NCPUS + NCUDA*cuda_multiplier). Set it to 10 or so if you have CUDA apps. - scheduler: don't overload effective_ncpus(); instead, add two new functions, max_results_day_multiplier() and max_wus_in_progress_multiplier() - scheduler: don't reduce max_results_day if we get an aborted job (it might have been aborted by the project; not appopriate to punish host in this case) svn path=/trunk/boinc/; revision=16959	2009-01-20 00:54:16 +00:00
David Anderson	91e120b3f4	- scheduler: improve message formatting; add <debug_locality> flag for locality scheduling messages svn path=/trunk/boinc/; revision=16921	2009-01-15 20:23:20 +00:00
David Anderson	312ffba708	- API: remove BOINC_OPTIONS::worker_thread_stack_size - web: check whether to show profile in separate function from displaying profile; eliminate double headers - scheduler: finish purge of redundant arguments svn path=/trunk/boinc/; revision=16726	2008-12-19 18:14:02 +00:00
David Anderson	98cfb8d3b0	- rename .C files to .cpp so that Doxygen will work svn path=/trunk/boinc/; revision=16069	2008-09-26 18:20:24 +00:00

17 Commits