boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	d02ff6e1c5	- fix typo svn path=/trunk/boinc/; revision=26063	2012-08-28 06:33:53 +00:00
David Anderson	9ccb8fa38d	- scheduler: add support for limited locality scheduling - API: remove support for PPM files svn path=/trunk/boinc/; revision=26062	2012-08-27 17:00:43 +00:00
David Anderson	32da1a7e37	- server: add support for having a mixture of CPU-intensive and non-CPU-intensive applications. An app can be specified as non-CPU-intensive in project.xml, and this attribute can be set or cleared using the admin web interface. Note: support for this was added to the client in 2011, but we didn't add server-side support at that time. This change is in 6.12 and later clients. svn path=/trunk/boinc/; revision=26060	2012-08-25 04:09:24 +00:00
David Anderson	b029e352c9	- scheduler: if sending GPU description to pre-7.0 client, call it CUDA instead of NVIDIA svn path=/trunk/boinc/; revision=26042	2012-08-17 06:10:25 +00:00
David Anderson	0d42a4aa5c	- file upload handler: add an #ifdef for disabling locking of files while writing to them. It's not clear to me that this locking is beneficial, and it may be causing filesystem problems at WCG - volunteer storage stuff svn path=/trunk/boinc/; revision=26021	2012-08-15 21:27:38 +00:00
David Anderson	6e816094bd	- volunteer data storage: intermediate checkin svn path=/trunk/boinc/; revision=25890	2012-07-25 21:41:32 +00:00
David Anderson	78f74661aa	- distributed storage: move chunk_size to VDA_FILE. Add some missing code. svn path=/trunk/boinc/; revision=25854	2012-07-07 19:44:48 +00:00
David Anderson	1776a244ae	- web: when showing a batch, recompute and update its fraction done - feeder: don't enumerate results for WUs with nonzero error_mask - scheduler: in slow_check(), make sure the WU error_mask is still zero svn path=/trunk/boinc/; revision=25822	2012-06-29 06:53:48 +00:00
Bernd Machenschalk	8b5b765bb7	scheduler: get app_version info for validator items svn path=/trunk/boinc/; revision=25658	2012-05-09 08:04:21 +00:00
David Anderson	4a50b2b2e2	- wrapper: compute final CPU time correctly for multi-process apps - storage stuff svn path=/trunk/boinc/; revision=25356	2012-02-29 20:58:45 +00:00
David Anderson	127e905e0d	- storage stuff. Getting there. svn path=/trunk/boinc/; revision=25355	2012-02-29 07:22:59 +00:00
David Anderson	516e5ad798	- storage stuff svn path=/trunk/boinc/; revision=25354	2012-02-29 01:11:28 +00:00
David Anderson	a8f883d2fa	- server: split out the "antique file deletion" feature of file_deleter.cpp into a separate program, since it blocks normal file deletion while it's running. From Bernd. - storage stuff svn path=/trunk/boinc/; revision=25321	2012-02-24 03:09:56 +00:00
David Anderson	2ed1cfbbb2	- scheduler and create_work: fix bugs that caused targeted jobs to be sent to non-targeted hosts. The feeder was erroneously putting targeted jobs in the shared mem cache. Changes: - The feeder only enumerates jobs for which workunit.transitioner_flags is zero. NOTE: this field is nonzero iff the job is assigned. - create_work: when creating an assigned jobs, set workunit.transitioner_flags appropriately svn path=/trunk/boinc/; revision=25314	2012-02-22 22:13:08 +00:00
David Anderson	540a16e2f0	- transitioner: fix bug that cause invalid SQL query svn path=/trunk/boinc/; revision=25197	2012-02-04 00:18:37 +00:00
David Anderson	130d6ed4f0	- server: revamp the "assigned job" mechanism. This now supports two main use cases: 1) there's a job that you want to run once on all hosts, present and future (or all hosts belonging to a user, or to a team). The job is never transitioned, validated, or assimilated. 2) There's a normal job for which you want to use only hosts belonging to a specific user (e.g. cluster or cloud hosts). This restriction can be made either when the job is created, or on the fly, e.g. as part of a scheme for accelerating batch completion. For the latter purposes we now provide a function restrict_wu_to_user(DB_WORKUNIT&, int userid); The job goes through the standard transitioner/validator/assimilator path. These cases are enabled by config flags <enable_assignment_multi/> <enable_assignment/> respectively. Assignment of type 2) are no longer stored in shared mem, so there is no limit on their number. There is no longer a rule that assigned job names must contain "asgn". NOTE: this requires a database update. svn path=/trunk/boinc/; revision=25169	2012-01-30 22:39:13 +00:00
David Anderson	10c79a7166	- scheduler: initialize COPROC_ATI::version to zero; avoid sending spurious "update driver" messages svn path=/trunk/boinc/; revision=25131	2012-01-23 21:59:12 +00:00
David Anderson	c05444ad1e	- GUI RPC: switching to the new XML parser (which won't parse a double as an int) revealed a type mismatch in FILE_TRANSFER::next_request_time between client and server. svn path=/trunk/boinc/; revision=25125	2012-01-23 05:03:52 +00:00
David Anderson	dd16170fc1	- scheduler: the p_fpops value reported by clients can't be trusted. Some credit cheats (e.g. with credit_by_runtime) can be done by reporting a huge value. Fix this by capping the value at 1.1 times the 95th percentile of host.p_fpops, taken over active hosts. svn path=/trunk/boinc/; revision=25017	2012-01-09 17:35:48 +00:00
David Anderson	22a911516c	- server: more fixes to DB to handle unsigned result IDs svn path=/trunk/boinc/; revision=24563	2011-11-09 17:27:50 +00:00
David Anderson	7c201eba3f	- DB: use %u when writing result IDs in SQL queries; this is to support SETI@home, which ran out of result IDs and changed the DB field type to int unsigned. Note: eventually I'll make this change official and change the .h types as well. - web: put <apps_selected> tags around <app_id> elements in project-specific prefs. svn path=/trunk/boinc/; revision=24555	2011-11-09 07:41:49 +00:00
David Anderson	e279b59913	- Updates Linux notifications to use current libnotify. - Fix build problems on Mac OS X using autotools - Consistently use #if HAVE_X for platform checks, rather than #ifdef HAVE_X or #if defined(HAVE_X) - In Unix build, make lots of compiler checks standard - Fix some compile warnings From Matt Arsenault. Note: there are now lots of compile warnings in clientgui/ on Unix, mostly in WxWidgets code svn path=/trunk/boinc/; revision=24303	2011-09-27 19:45:27 +00:00
David Anderson	a886e0be7c	- transitioner: fix bug related to new runtime_outlier field svn path=/trunk/boinc/; revision=24228	2011-09-16 20:42:45 +00:00
David Anderson	e49f945908	- Validator: allow project-specific code to mark a result is a "runtime outlier", i.e. its runtime does not correspond to the job's rsc_fpops_est. Runtime outliers are not counted in the statistics for elapsed time, turnaround time, and peak FLOPs count. The is intended for applications like SETI@home, some of whose jobs finish more or less instantly (this happens if the data contains a lot of interference). If a host happens to get a bunch of these short jobs, its statistics will get skewed: in essence, the server will think that the host is extremely fast, and will send it too many jobs. svn path=/trunk/boinc/; revision=24225	2011-09-16 16:43:15 +00:00
David Anderson	176b0a4327	- validator: add a --credit_from_runtime option. This assigns credit proportional to runtime*p_fpops. To prevent cheating, p_fpops is capped at the 95th percentile value among active hosts, and runtime is capped at a specified limit. This option supports apps, like LHC's CERNvm app, that run for a certain amount of time and then exit. The CreditNew system doesn't work for such apps. - trickle_credit: To prevent cheating, cap p_fpops at the 95th percentile value among active hosts, and require a limit on runtime. - require that trickle handlers supply an initialization function svn path=/trunk/boinc/; revision=24182	2011-09-13 21:01:42 +00:00
David Anderson	b80f1525f6	- feeder: change the DB query to skip jobs for deprecated apps. Otherwise, if you have a deprecated app with >= 200 jobs (200 is the query's limit) it could always get jobs for that app, and never put anything into the cache. svn path=/trunk/boinc/; revision=24142	2011-09-07 19:57:46 +00:00
David Anderson	c5c5975b44	- Improve interface of XML_PARSER. Add parsed_tag and is_tag to the class, so that parsing functions don't need to declare them and pass them around. - Complete the task of using XML_PARSER as the argument to all parsing functions. (Internally, many of these functions still use the old XML parser; that's the next step.) svn path=/trunk/boinc/; revision=23978	2011-08-10 17:11:08 +00:00
David Anderson	93add14614	- backend: use new XML parser for input template files (so that they don't have to be 1 element/line) and also allow optional <input_template> root element - fix bug in WORKUNIT DB interface svn path=/trunk/boinc/; revision=23648	2011-06-07 04:12:49 +00:00
David Anderson	436415cfe1	- scheduler, back end: add "homogeneous app version" feature. Lets you specify, on a per-app basis, that all instances should be done using the same app version. This is for validation in the presence of GPUs. - scheduler: code cleanup - Instead of adding a bunch of non-DB fields to RESULT, used a derived class SCHED_DB_RESULT. - Instead of storing a pointer to BEST_APP_VERSION in RESULT, store the structure itself. This simplifies the memory allocation situation. - client: condition "Got server request to delete file" messages on <file_xfer_debug> svn path=/trunk/boinc/; revision=23636	2011-06-06 03:40:42 +00:00
Bernd Machenschalk	4fa5f4bd8c	Einstein@home extensions: - protect malloc.h - validator: allow to update 'random' result field - assimilator: add global variables results_prefix and transcripts_prefix svn path=/trunk/boinc/; revision=23241	2011-03-18 08:20:11 +00:00
David Anderson	53a7307305	- scheduler: fix nasty bug introduced in [23040] that caused no jobs to be sent. svn path=/trunk/boinc/; revision=23096	2011-02-23 21:22:45 +00:00
David Anderson	5421335dbb	- transitioner: fix bug that could cause file deletion to not be done for some WUs - back end: fix the way "report grace period" is implemented old: result.report_deadline (i.e. what's in the DB) and the deadline sent to the client are the same. Some confusing and incorrect logic in the transitioner tries to provide the desired semantics. new: result.report_deadline is the deadline sent to the client, plus the grace period. No logic in the transitioner is needed. svn path=/trunk/boinc/; revision=23040	2011-02-15 22:07:14 +00:00
David Anderson	5911a059dd	- compile fix svn path=/trunk/boinc/; revision=22390	2010-09-20 17:16:44 +00:00
David Anderson	2e00bb3084	- scheduler: fix crashing bug when client reports a large # (1000+) of results (256KB not enough for query in this case) svn path=/trunk/boinc/; revision=22389	2010-09-19 03:42:51 +00:00
David Anderson	bf56e80bce	- tweaks svn path=/trunk/boinc/; revision=22306	2010-08-29 08:43:40 +00:00
David Anderson	7c51512cbf	- transitioner: the format string for a DB query had %.15d instead of %.15e. That produced a messed-up query that assigned garbage values to: host_app_version.turnaround_var host_app_version.turnaround_q host_app_version.max_jobs_per_day host_app_version.consecutive_valid To repair these: - set turnaround_var and turnaround_q to zero - if max_jobs_per_day is outside of (0..config.daily_result_quota) set it to config.daily_result_quota - if consecutive_valid is outside (0..1000), set it to zero I added a script, html/ops/repair_21812.php, that does this; if you ran server code between [21181] and [21812], run this script. - scheduler/transitioner: add <debug_quota> log flag - changed the build system to always use -Wall (if we'd done this before, this bug wouldn't have happened) - fixed a bunch of other compile warnings svn path=/trunk/boinc/; revision=21812	2010-06-25 18:54:37 +00:00
David Anderson	4147249de2	- server: delete old credit stuff - user web: show host link in user result list. Fixes #999 svn path=/trunk/boinc/; revision=21735	2010-06-12 22:08:15 +00:00
David Anderson	8b836a391b	- database: remove unused fields from app table svn path=/trunk/boinc/; revision=21728	2010-06-11 03:50:47 +00:00
David Anderson	89fab4ece5	- back end: change "daily result quota" mechanism. Old: config.xml specifies an initial daily quota (say, 100). Each host_app_version starts out with this quota. On the return of a SUCCESS result, the quota is doubled, up to the initial value. On the return of an error result, or a timeout, the quota is decremented down to 1. Problem: Doesn't accommodate hosts that can do more than 100 jobs/day. New: similar, but - on validation of a job, daily quota is incremented. - on invalidation of a job, daily quota is decremented. - on return of an error result, or a timeout, daily quota is min'd with initial quota, then decremented. Notes: - This allows a host to have an unboundedly large quota as long as it continues to return more valid than invalid results. - Even with this change, hosts that return SUCCESS but invalid results will continue to get the initial daily quota. It would be desirable to reduce their quota to 1. svn path=/trunk/boinc/; revision=21675	2010-06-02 00:11:01 +00:00
David Anderson	7daae1d0c7	- client: when emerge from bandwidth quota network suspension, add 0..1hr random delay to existing transfers, to avoid DDOS effect svn path=/trunk/boinc/; revision=21415	2010-05-07 20:08:59 +00:00
David Anderson	ef0019d8c3	- validator: bug fixes: bad formula for low_average(); failure to reread app_versions because of 1e6/1e-6 typo svn path=/trunk/boinc/; revision=21302	2010-04-26 23:12:40 +00:00
David Anderson	5035007b90	- back end: new way of deciding: - whether host is "reliable" for an app version - whether host is eligible for single replication for an app version - whether to use host scaling In each case, the answer is yes if the number of consecutive valid results is above a threshold. This replaces existing "error rate" and "scale probation" mechanisms. TODO: the # of consecutive valid results should also determine a limit on jobs in progress for an app version. Namely, if N is the threshold for host scaling, the limit should be ndevices*(max(1, consecutive_valid - N)) The client currently doesn't supply enough app version info to do this. It could be approximated; that would give some protection against cherry-picking. - credit: more conservative formulas for combining claimed credit among replicas. If there are normal replicas, we use a "low average" that weights each sample by the sum of the other samples. Otherwise we use the min (not the average) of the approximate samples. NOTE: a DB update is required svn path=/trunk/boinc/; revision=21230	2010-04-21 19:33:20 +00:00
David Anderson	61195cb59d	- validator: fix bug where host.total_credit not incremented svn path=/trunk/boinc/; revision=21211	2010-04-19 21:46:45 +00:00
David Anderson	02717af2f3	- bug fixes svn path=/trunk/boinc/; revision=21187	2010-04-15 21:58:44 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	fb851311e0	- server: various changes; see http://boinc.berkeley.edu/trac/wiki/CreditNew Projects will need to update DB and recompile all back-end programs. Summary: - new way of computing credit - "reliable host" mechanism is per app version - "host punishment" mechanism is per app version - adjustment of wu.rsc_fpops_est provides the equivalent of per app version DCF - max jobs in progress is now per app - max jobs per RPC is now per app TODO: - reliable mechanism: - populate and use host_app_version.error_rate - populate host_app_version.turnaround - host punishment: - populate host_app_version.max_jobs_per_day - populate host_app_version.n_jobs_today - use app.max_jobs_per_day_init - job limits: - use app.max_jobs_in_progress, max_gpu_jobs_in_progress - use app.max_jobs_per_rpc - adjust wu.rsc_fpops_est - remove old credit stuff fpops_cumulative, credit_multiplier credit computation in scheduler - AVERAGE class: use the Knuth algorithm (Wikipedia) svn path=/trunk/boinc/; revision=21021	2010-03-29 22:28:20 +00:00
David Anderson	295d4b54ea	- server: major improvements to locality scheduling from Einstein@home. Triggering the work generator is now done via the DB instead of flat files. Since only E@h uses locality scheduling, I kept the DB changes in a separate file (db/schema_locality.sql). There's a new field in the workunit table, and that's a required update (in db_update.php) - manager: compile fix svn path=/trunk/boinc/; revision=20807	2010-03-05 22:55:16 +00:00
David Anderson	53aa10570a	- feeder: fix crashing bug svn path=/branches/server_stable/; revision=19508	2009-11-06 23:31:26 +00:00
David Anderson	da7e82fe15	- scheduler and back end: add new fields to result table: elapsed_time: the elapsed time (runtime) as reported by client flops_estimate: the app's estimated FLOPS as reported by app_plan() app_version_id: the DB ID of the app_version used (or -1 if anonymous platform) TODO: show these in the web interfaces, and use them where appropriate svn path=/trunk/boinc/; revision=19002	2009-09-03 20:26:31 +00:00
David Anderson	8b701fc73f	- scheduler: fix messed-up deadline check logic. Old: 1) check deadline based on wu.delay_bound 2) in add_result_to_reply(), potentially modify wu.delay_bound, e.g. because of retry acceleration problem: reducing delay bound may cause deadline miss New: 1) new function get_delay_bound_range() (called from wu_is_infeasible_fast()) returns optimistic and pessimistic delay bounds. Retry acceleration logic is here. 2) check deadline based on optimistic bound; if that fails, check based on pessimistic bound. Set wu.delay_bound to the one that worked. Notes: - get_delay_bound_range() needs result priority and report deadline, and it's called before we read the full result. So add these items to WORK_ITEM and WU_RESULT. - get_delay_bound_range() could be customized for project-specific deadline policy. - add_result_to_reply() was becoming a toxic waste dump. Deadline-related stuff should have been factored out in any case. svn path=/trunk/boinc/; revision=18946	2009-08-31 19:35:46 +00:00

1 2

54 Commits