boinc

Commit Graph

Author	SHA1	Message	Date
Bernd Machenschalk	34c823a9ab	Merge branch 'EinsteinAtHome' into 'master' This is meant not to break anything, just add some (optional) logging and features needed for Einstein@Home. Please contact me before changing or removing any of this. Conflicts: sched/db_dump.cpp sched/file_deleter.cpp sched/validator.cpp	2014-05-26 14:42:36 +02:00
Bernd Machenschalk	2f6d140c56	validator: added options -min_wu_id and -max_wu_id to validator	2014-05-23 12:06:00 +02:00
David Anderson	de6540cbc0	scheduler: if a result was aborted by user, don't count it as an error	2014-05-22 23:54:56 -07:00
David Anderson	b17455816d	db_dump: include badges in XML stats export I did this by including list of badges in the tables.xml file, and writing the list of badge assignments to 2 new files, badge_user.gz (for users) and badge_team.gz (for teams). I considered including the badges within the <user> and <team> elements. However, this would require enumerating the badges for a particular user within the enumeration of users, which doesn't work; only one enumeration can be active at a time. Plus it would be less efficient, and db_dump already takes a half hour on a big project.	2014-05-18 19:19:05 -07:00
David Anderson	fec574f4e8	create_work: increase the efficiency of bulk job creation The job submission RPC handler (PHP) originally ran the create_work program once per job. This took about 1.5 minutes to create 1000 jobs. Recently I changed this so that create_work only is run once; it does one SQL insert per job. Disappointingly, this was only slightly faster: 1 min per 1000 jobs. This commit changes create_work to create multiple jobs per SQL insert (as many as will fit in a 1 MB query, which is the default limit). This speeds things up by a factor of 100: 1000 jobs in 0.5 sec.	2014-04-10 23:53:19 -07:00
David Anderson	fc7c75b200	server: parse peak memory/disk info from client, store in DB, display in web The latest client reports the peak working set size, swap size, and disk usage for completed jobs. Add fields to the results table to store these. Parse them in scheduler request messages, and write to the DB. Display them in the result web page. This data can be used to improve (or even automate) the job estimates for memory and disk usage.	2014-04-02 19:35:59 -07:00
David Anderson	0c430ce1fa	Add support for multi-size apps See http://boinc.berkeley.edu/trac/wiki/MultiSize The components of this include: - DB changes: add size_class to workunit and result n_size_classes to app; >1 means multi-size - size_regulator daemon program: change results states from INACTIVE to UNSENT carefully - size_census program; writes quantile info in flat files - transitioner: when creating results for multi-size apps, set server state to INACTIVE - sched shmem (feeder): read quantile info from flat files, store in shared memory - scheduler (score-based scheduling): for multi-size apps, add component to score function for size class. - show_shmem: show result size class - make_work (and other callers of count_unsent_results()): count both INACTIVE and UNSENT - create_work: add --size_class cmdline option Also: - if get MySQL errors in upgrade, don't rewrite db_version	2013-04-25 00:27:35 -07:00
David Anderson	2ded3ff67d	- fix typo in GUI RPC - check in some code for multi-user job prioritization	2013-03-04 15:23:39 +01:00
David Anderson	68f9880615	- client: remove "device" entry from CUDA_DEVICE_PROP, and change types of mem-size fields from int to double. These fields are size_t in NVIDIA's version of this; however, cuDeviceGetAttribute() returns them as int, so I don't see where this makes any difference. - client: fix bug in handling of <no_rsc_apps> element. - scheduler: message tweaks. Note: [foo] means that the message is enabled by <debug_foo>. svn path=/trunk/boinc/; revision=25849	2012-07-05 20:24:17 +00:00
David Anderson	759c23ed27	- server: create a harness for testing validator code. If you link your functions (init_result(), compare_results(), cleanup_result()) with validate_test.cpp, you'll get a program that you can run as validate_test file1 file2 and it will compare the two files (this works only for validators that expect 1 file per result). I added a makefile, sched/makefile_validator_test, that you can use for this. - server: shuffle code so that the above doesn't need to link MySQL libraries - client: if we fetch a master file and it contains no scheduler URLs, show a message of class INTERNAL_ERROR - client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double, and remove dtotalGlobalMem. Although NVIDIA reports RAM size as a size_t, there's no reason to store it as an integer after that. svn path=/trunk/boinc/; revision=25542	2012-04-10 00:32:35 +00:00
David Anderson	86f50ba080	- admin web: when resetting app statistics, clear elapsed time stats as well as PFC stats svn path=/trunk/boinc/; revision=25530	2012-04-05 11:01:38 +00:00
David Anderson	4a50b2b2e2	- wrapper: compute final CPU time correctly for multi-process apps - storage stuff svn path=/trunk/boinc/; revision=25356	2012-02-29 20:58:45 +00:00
David Anderson	516e5ad798	- storage stuff svn path=/trunk/boinc/; revision=25354	2012-02-29 01:11:28 +00:00
David Anderson	ce52c9cf3e	- storage stuff svn path=/trunk/boinc/; revision=25341	2012-02-24 22:55:11 +00:00
David Anderson	a8f883d2fa	- server: split out the "antique file deletion" feature of file_deleter.cpp into a separate program, since it blocks normal file deletion while it's running. From Bernd. - storage stuff svn path=/trunk/boinc/; revision=25321	2012-02-24 03:09:56 +00:00
David Anderson	2ed1cfbbb2	- scheduler and create_work: fix bugs that caused targeted jobs to be sent to non-targeted hosts. The feeder was erroneously putting targeted jobs in the shared mem cache. Changes: - The feeder only enumerates jobs for which workunit.transitioner_flags is zero. NOTE: this field is nonzero iff the job is assigned. - create_work: when creating an assigned jobs, set workunit.transitioner_flags appropriately svn path=/trunk/boinc/; revision=25314	2012-02-22 22:13:08 +00:00
David Anderson	c4d1229830	- scheduler: in version selection, when deciding which version is fastest, we multiple projected FLOPS by a normal random var with mean 1 and stddev 0.1. Make the stddev configurable; in particular it can be zero. svn path=/trunk/boinc/; revision=25311	2012-02-22 19:51:09 +00:00
David Anderson	1b8d6b098d	- storage stuff (work in progress) - small code shuffle svn path=/trunk/boinc/; revision=25274	2012-02-16 23:59:26 +00:00
David Anderson	480e28b54c	- web: fix the user search feature - scheduler: parse d_project_share - scheduler: if vbox and vbox_mt are both available, use vbox for a 1-CPU machine svn path=/trunk/boinc/; revision=25176	2012-02-01 03:30:14 +00:00
David Anderson	130d6ed4f0	- server: revamp the "assigned job" mechanism. This now supports two main use cases: 1) there's a job that you want to run once on all hosts, present and future (or all hosts belonging to a user, or to a team). The job is never transitioned, validated, or assimilated. 2) There's a normal job for which you want to use only hosts belonging to a specific user (e.g. cluster or cloud hosts). This restriction can be made either when the job is created, or on the fly, e.g. as part of a scheme for accelerating batch completion. For the latter purposes we now provide a function restrict_wu_to_user(DB_WORKUNIT&, int userid); The job goes through the standard transitioner/validator/assimilator path. These cases are enabled by config flags <enable_assignment_multi/> <enable_assignment/> respectively. Assignment of type 2) are no longer stored in shared mem, so there is no limit on their number. There is no longer a rule that assigned job names must contain "asgn". NOTE: this requires a database update. svn path=/trunk/boinc/; revision=25169	2012-01-30 22:39:13 +00:00
David Anderson	10c79a7166	- scheduler: initialize COPROC_ATI::version to zero; avoid sending spurious "update driver" messages svn path=/trunk/boinc/; revision=25131	2012-01-23 21:59:12 +00:00
David Anderson	c05444ad1e	- GUI RPC: switching to the new XML parser (which won't parse a double as an int) revealed a type mismatch in FILE_TRANSFER::next_request_time between client and server. svn path=/trunk/boinc/; revision=25125	2012-01-23 05:03:52 +00:00
David Anderson	dd16170fc1	- scheduler: the p_fpops value reported by clients can't be trusted. Some credit cheats (e.g. with credit_by_runtime) can be done by reporting a huge value. Fix this by capping the value at 1.1 times the 95th percentile of host.p_fpops, taken over active hosts. svn path=/trunk/boinc/; revision=25017	2012-01-09 17:35:48 +00:00
David Anderson	e8657adfd2	- scheduler: change vbox_mt app plan function to use 1, 2 or 3 CPUs depending on how many the host has, and whether CPU VM extensions are present (this reflects the requirements of CernVM). svn path=/trunk/boinc/; revision=25009	2012-01-08 01:28:39 +00:00
David Anderson	95ebb112c2	- client: for VBox apps, check stderr for "ERR_CPU_VM_EXTENSIONS_DISABLED". If found, set HOST_INFO::p_vm_extensions_disabled, and pass this to the scheduler. - scheduler (VBox app plan function) if a host has p_vm_extensions_disabled set, don't sent it multicore VBox jobs. Note: if you have a host with VM extensions, and they're disabled in the BIOS, and you enable them, you can remove the <p_vm_extensions_disabled> line from client_state.xml and you'll be eligible to get multicore VM jobs again. svn path=/trunk/boinc/; revision=24944	2011-12-30 09:43:58 +00:00
David Anderson	8877aa5183	- web: in GPU model list page, look for plan classes containing "nvidia" as well as "cuda". svn path=/trunk/boinc/; revision=24614	2011-11-16 19:47:40 +00:00
David Anderson	e49f945908	- Validator: allow project-specific code to mark a result is a "runtime outlier", i.e. its runtime does not correspond to the job's rsc_fpops_est. Runtime outliers are not counted in the statistics for elapsed time, turnaround time, and peak FLOPs count. The is intended for applications like SETI@home, some of whose jobs finish more or less instantly (this happens if the data contains a lot of interference). If a host happens to get a bunch of these short jobs, its statistics will get skewed: in essence, the server will think that the host is extremely fast, and will send it too many jobs. svn path=/trunk/boinc/; revision=24225	2011-09-16 16:43:15 +00:00
David Anderson	176b0a4327	- validator: add a --credit_from_runtime option. This assigns credit proportional to runtime*p_fpops. To prevent cheating, p_fpops is capped at the 95th percentile value among active hosts, and runtime is capped at a specified limit. This option supports apps, like LHC's CERNvm app, that run for a certain amount of time and then exit. The CreditNew system doesn't work for such apps. - trickle_credit: To prevent cheating, cap p_fpops at the 95th percentile value among active hosts, and require a limit on runtime. - require that trickle handlers supply an initialization function svn path=/trunk/boinc/; revision=24182	2011-09-13 21:01:42 +00:00
David Anderson	7c81d72378	- web: fix warnings in forum pages - scheduler: when using elapsed time stats to predict runtime, cap the estimated FLOPS at twice the peak FLOPS; otherwise, if a host has received a lot of very short jobs recently, it will get a too-high FLOPS estimate and will exceed the rsc_fpops_bound limit. svn path=/trunk/boinc/; revision=24128	2011-09-05 17:29:53 +00:00
David Anderson	c5c5975b44	- Improve interface of XML_PARSER. Add parsed_tag and is_tag to the class, so that parsing functions don't need to declare them and pass them around. - Complete the task of using XML_PARSER as the argument to all parsing functions. (Internally, many of these functions still use the old XML parser; that's the next step.) svn path=/trunk/boinc/; revision=23978	2011-08-10 17:11:08 +00:00
David Anderson	271699ea0a	- server: fix typo svn path=/trunk/boinc/; revision=23904	2011-07-30 22:42:05 +00:00
David Anderson	6e5acbbe60	- web: remote job submission: - add fields to batch table, extend APIs accordingly - require that example web interface run on BOINC server (this makes many things easier; an actual remote interface would require a bit more work) svn path=/trunk/boinc/; revision=23881	2011-07-27 06:20:48 +00:00
David Anderson	83965db576	- web: more remote job submission code. Not finished. svn path=/trunk/boinc/; revision=23871	2011-07-25 21:45:53 +00:00
David Anderson	2177a6bd95	- server: restore fpops/intops_cumulative to RESULT (structure, not table) for AQUA - client, Windows: when wake up from hibernation, get the time before printing log msg svn path=/trunk/boinc/; revision=23784	2011-06-29 23:00:39 +00:00
David Anderson	4403df42d8	- client: add <type> element to <exclude_gpu> config option, in case of multiple GPU types svn path=/trunk/boinc/; revision=23777	2011-06-25 05:13:56 +00:00
David Anderson	436415cfe1	- scheduler, back end: add "homogeneous app version" feature. Lets you specify, on a per-app basis, that all instances should be done using the same app version. This is for validation in the presence of GPUs. - scheduler: code cleanup - Instead of adding a bunch of non-DB fields to RESULT, used a derived class SCHED_DB_RESULT. - Instead of storing a pointer to BEST_APP_VERSION in RESULT, store the structure itself. This simplifies the memory allocation situation. - client: condition "Got server request to delete file" messages on <file_xfer_debug> svn path=/trunk/boinc/; revision=23636	2011-06-06 03:40:42 +00:00
David Anderson	b6140088e3	- update_versions: trim XML strings svn path=/trunk/boinc/; revision=23569	2011-05-21 06:22:15 +00:00
David Anderson	53a7307305	- scheduler: fix nasty bug introduced in [23040] that caused no jobs to be sent. svn path=/trunk/boinc/; revision=23096	2011-02-23 21:22:45 +00:00
David Anderson	5421335dbb	- transitioner: fix bug that could cause file deletion to not be done for some WUs - back end: fix the way "report grace period" is implemented old: result.report_deadline (i.e. what's in the DB) and the deadline sent to the client are the same. Some confusing and incorrect logic in the transitioner tries to provide the desired semantics. new: result.report_deadline is the deadline sent to the client, plus the grace period. No logic in the transitioner is needed. svn path=/trunk/boinc/; revision=23040	2011-02-15 22:07:14 +00:00
David Anderson	3355b66241	- client and scheduler: a client host may have multiple VM systems installed. TODO: check for VirtualBox on Mac, Linux svn path=/trunk/boinc/; revision=22704	2010-11-17 23:19:07 +00:00
Rom Walton	1564a49816	- sched: Parse the detected virtual machine software from the scheduler request so it can be used in plan classes. db/ boinc_db.h sched/ sched_types.cpp svn path=/trunk/boinc/; revision=22703	2010-11-17 20:52:01 +00:00
David Anderson	ae7866b251	- scheduler: restore scaling of daily quota by # processors and/or config.gpu_multiplier - client: msg tweak svn path=/trunk/boinc/; revision=21753	2010-06-15 22:21:57 +00:00
David Anderson	4147249de2	- server: delete old credit stuff - user web: show host link in user result list. Fixes #999 svn path=/trunk/boinc/; revision=21735	2010-06-12 22:08:15 +00:00
David Anderson	8b836a391b	- database: remove unused fields from app table svn path=/trunk/boinc/; revision=21728	2010-06-11 03:50:47 +00:00
David Anderson	c4df1f3104	svn path=/trunk/boinc/; revision=21232	2010-04-21 20:11:41 +00:00
David Anderson	5035007b90	- back end: new way of deciding: - whether host is "reliable" for an app version - whether host is eligible for single replication for an app version - whether to use host scaling In each case, the answer is yes if the number of consecutive valid results is above a threshold. This replaces existing "error rate" and "scale probation" mechanisms. TODO: the # of consecutive valid results should also determine a limit on jobs in progress for an app version. Namely, if N is the threshold for host scaling, the limit should be ndevices*(max(1, consecutive_valid - N)) The client currently doesn't supply enough app version info to do this. It could be approximated; that would give some protection against cherry-picking. - credit: more conservative formulas for combining claimed credit among replicas. If there are normal replicas, we use a "low average" that weights each sample by the sum of the other samples. Otherwise we use the min (not the average) of the approximate samples. NOTE: a DB update is required svn path=/trunk/boinc/; revision=21230	2010-04-21 19:33:20 +00:00
David Anderson	021edb02c2	- back end programs: improve log msgs svn path=/trunk/boinc/; revision=21193	2010-04-16 18:07:08 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	2536797068	- validator: remove update_credit_per_cpu_sec(). Irrelevant. TODO: remove related code - validator: update wu.canonical_credit correctly. However, this field should be deprecated. - validator: check for error return from assign_credit_set(). svn path=/trunk/boinc/; revision=21096	2010-04-05 20:03:54 +00:00
David Anderson	19f7d66b53	- backend programs: change the way PFC and elapsed-time statistics are written to the DB. The incremental approach was bogus. New approach: host_app_version: write directly; R/W interval is tiny app_version: maintain an explicit list of update samples for both PFC and credit. When the validator flushes its app_version cache, do careful updates. Note: when using double fields in careful updates, you can't test for equality. Use abs(new-old) < 1e-N svn path=/trunk/boinc/; revision=21057	2010-04-02 19:10:37 +00:00

1 2 3 4 5

220 Commits