boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	dd16170fc1	- scheduler: the p_fpops value reported by clients can't be trusted. Some credit cheats (e.g. with credit_by_runtime) can be done by reporting a huge value. Fix this by capping the value at 1.1 times the 95th percentile of host.p_fpops, taken over active hosts. svn path=/trunk/boinc/; revision=25017	2012-01-09 17:35:48 +00:00
David Anderson	e8657adfd2	- scheduler: change vbox_mt app plan function to use 1, 2 or 3 CPUs depending on how many the host has, and whether CPU VM extensions are present (this reflects the requirements of CernVM). svn path=/trunk/boinc/; revision=25009	2012-01-08 01:28:39 +00:00
David Anderson	0777ab174a	- scheduler: if using homogeneous app version and a WU is committed to a superceded or deprecated app version, use it anyway. The current app version may not validate against the old one. svn path=/trunk/boinc/; revision=24823	2011-12-17 22:11:26 +00:00
David Anderson	2ac9fe8566	- client/scheduler: If the file "client_opaque.txt" exists on the client, include its contents in scheduler request messages. On the scheduler, parse this into SCHEDULER_REQUEST::client_opaque, where it can be used by the customizable scheduler functions. svn path=/trunk/boinc/; revision=24586	2011-11-14 06:27:36 +00:00
David Anderson	c2cdaacf89	- scheduler: fix bugs that broke work fetch for anonymous platform; don't send irrelevant messages to anon platform clients svn path=/trunk/boinc/; revision=24326	2011-10-03 23:43:53 +00:00
David Anderson	7c81d72378	- web: fix warnings in forum pages - scheduler: when using elapsed time stats to predict runtime, cap the estimated FLOPS at twice the peak FLOPS; otherwise, if a host has received a lot of very short jobs recently, it will get a too-high FLOPS estimate and will exceed the rsc_fpops_bound limit. svn path=/trunk/boinc/; revision=24128	2011-09-05 17:29:53 +00:00
David Anderson	c5c5975b44	- Improve interface of XML_PARSER. Add parsed_tag and is_tag to the class, so that parsing functions don't need to declare them and pass them around. - Complete the task of using XML_PARSER as the argument to all parsing functions. (Internally, many of these functions still use the old XML parser; that's the next step.) svn path=/trunk/boinc/; revision=23978	2011-08-10 17:11:08 +00:00
David Anderson	27e05a3da9	- server: some stuff to prepare for distributed storage - don't create result records for uploads and downloads. Just create a msg_to_client record. - the scheduler handles file-transfer results specially; it makes a vector of them, then calls a project-supplied function handle_file_xfer_results() - change the interface and implementation of put_file and get_file - client write project sched priority in GUI RPC replies, but not to the state file svn path=/trunk/boinc/; revision=23857	2011-07-19 20:52:41 +00:00
David Anderson	436415cfe1	- scheduler, back end: add "homogeneous app version" feature. Lets you specify, on a per-app basis, that all instances should be done using the same app version. This is for validation in the presence of GPUs. - scheduler: code cleanup - Instead of adding a bunch of non-DB fields to RESULT, used a derived class SCHED_DB_RESULT. - Instead of storing a pointer to BEST_APP_VERSION in RESULT, store the structure itself. This simplifies the memory allocation situation. - client: condition "Got server request to delete file" messages on <file_xfer_debug> svn path=/trunk/boinc/; revision=23636	2011-06-06 03:40:42 +00:00
David Anderson	a7828abdda	- scheduler: removed unused destructors in COPROC that caused scheduler to crash (not sure why) svn path=/trunk/boinc/; revision=23312	2011-04-01 21:21:11 +00:00
David Anderson	eeab2aee92	- simulator work - fix some indentation svn path=/trunk/boinc/; revision=22891	2011-01-07 20:23:22 +00:00
David Anderson	18f2e90929	- client: work fetch: if the chosen project is currently uploading a file, and an upload started in the last 5 min, don't fetch work from it. The goal is to merge the 2 scheduler RPCs (fetch work, report completed taskS) into a single RPC. Note: this may result in idleness in some cases. - scheduler: if client doesn't handle plan class (pre-5.10), check plan-class app versions anyway, but only use if it's a single-CPU app. This allows single-CPU app versions with specific requirements (like SSE) to be issued to old clients. From Bernd Machenschalk svn path=/trunk/boinc/; revision=22841	2010-12-13 22:58:15 +00:00
David Anderson	f8e2d07cf9	- scheduler: add vbox32 and vbox64 plan classes for VirtualBox apps. svn path=/trunk/boinc/; revision=22778	2010-11-30 19:36:07 +00:00
David Anderson	be14996a1e	- scheduler: deal correctly with jobs that need > 2GB RAM. Such jobs fail on 32-bit machines, even if they have sufficient RAM, because 32-bit OSs don't support address spaces > 2GB. In general, we want to support the following scenario: - an app has a mixture of small (< 2GB) and big (> 2GB) jobs. - there are app versions for both 32b and 64b platforms - one of the 32b versions is faster than the 64b version (say, it's a 32b GPU app) Goals: If the client is 32b, send it only small jobs, using the fast 32b version if possible If the client is 64b and has sufficient RAM, send it large jobs using the 64b version; send it small jobs using the fast 32b version if possible, else the 64b version Solution: extend get_app_version() so that it detects big jobs, and uses only 64b versions for them. Add a "for_64b_jobs" field to BEST_APP_VERSION so that we maintain a separate memoized set of BEST_APP_VERSIONs for big jobs. - client: don't set report_results_immediately inappropriately svn path=/trunk/boinc/; revision=22440	2010-10-01 19:54:09 +00:00
David Anderson	3dffe0a8bc	- API: remove deprected stuff related to: 1) old-style apps with graphics in main program. No one should be using these anymore. 2) writing init_data.xml in boinc_finish(). This was used by deprecated "compound app" scheme - scheduler: if request reports results that were previously reported, that's evidence that the previous reply was not received by client. It may have contained results. So set a "resend lost results" flag. svn path=/trunk/boinc/; revision=22203	2010-08-11 22:02:41 +00:00
David Anderson	23de5a887f	- client/scheduler: tweak translatable messages svn path=/trunk/boinc/; revision=22129	2010-08-04 18:41:24 +00:00
David Anderson	6b8a569d6d	- client/scheduler: fix a group of bugs related to the new mechanism where the client tells the scheduler which app versions its queued jobs use (this is needed, e.g., to enforce per-app or per-resource job limits). In this mechanism, the client sends an array of <app_version>s, and each <other_result> includes an index into this array. - The wrong index was being sent (client). - If an <app_version> had a non-existent app name (e.g. because that app had been deprecated) it wasn't getting put in the array, invalidating array indices Furthermore, an erroneous message was being sent to the user Fix: if parse error for <app_version>, put it in the array anyway, but with cav.app = NULL, meaning that it's a place-holder. Send a message to user only if anon platform. - manager: increase notice buffers to 64K svn path=/trunk/boinc/; revision=22052	2010-07-23 17:43:20 +00:00
David Anderson	0f613d61d8	- scheduler and client: fix the "allow multiple clients" feature. This feature lets you run the BOINC client as a job on grid systems that handle only 1-CPU jobs; it disables various mechanisms that prevent multiple clients per host (which is normally a bad thing). Old: - Run the client with a --allow_multiple_clients flag. This tells it not to use a mutex that prevents multiple clients per host. - Run the project with the <multiple_clients_per_host> config flag. This suppresses two mechanisms: - (avoid duplicate host records) on a scheduler request with no host ID, looks for a host with same domain name, OS type, and mem size, and assumes the request is from that host - (job retry) If we get a request that doesn't have a host ID but does have a host CPID, mark its in-progress results as over NOTE: I CAN'T REMEMBER WHY WE SUPPRESS THIS; MARK S, DO YOU REMEMBER? Problem: if the grid clients attach to a project that doesn't use <multiple_clients_per_host>, bad things happen. E.g., if there are several requests at about the same time, most of them will fail with "another RPC already in progress" errors. If a project does include this flag, it loses protection from duplicate host records. New: - If the client is run with --allow_multiple_clients flag, it passes a <allow_multiple_clients> element in scheduler requests. - The scheduler skips the duplicate-host check on requests that include this flag. - There is no more <multiple_clients_per_host> scheduler option. Note: if a project using the old mechanism upgrades to this change, it will need to use new clients for its grid deployment. svn path=/trunk/boinc/; revision=21839	2010-06-29 16:37:28 +00:00
David Anderson	f849faea5e	- scheduler: bug fixes for jobs-in-progress limits - client: msg tweak svn path=/trunk/boinc/; revision=21692	2010-06-04 16:57:33 +00:00
David Anderson	cf7fb29227	- scheduler: add fine-grained "max jobs in progress" control. You can now specify limits for specific apps, and/or for the project as a whole. Within each of these, you can specify limits on CPU jobs, GPU jobs, or total jobs. In the case of CPU and GPU limits, you can specify whether the limit should be scaled by the number of devices. Note: the enforcement of this is done in get_app_version(), since per-resource-type limits may dictate what app versions we can use for a particular job. svn path=/trunk/boinc/; revision=21674	2010-06-01 23:41:07 +00:00
David Anderson	d45d3b488f	- server: code cleanup svn path=/trunk/boinc/; revision=21664	2010-06-01 03:45:49 +00:00
David Anderson	ca239d913a	- scheduler: fix memory leak (free BEST_APP_VERSION objects) svn path=/trunk/boinc/; revision=21597	2010-05-21 21:49:54 +00:00
David Anderson	40eebe00af	- client/scheduler: in COPROCS, instead of having a vector of pointers to dynamically allocated COPROC-derived objects, just have the objects themselves. Dynamic allocation should be avoided at all costs. svn path=/trunk/boinc/; revision=21564	2010-05-18 19:22:34 +00:00
David Anderson	63dcfabe0e	- scheduler: changeset 21148 broke the scheduler. We store pointers to BEST_APP_VERSION in both APP_VERSION and RESULT. We can't then fiddle with the vector that these point into. Switch back to using a vector of pointers. This restores the memory leak, which I'll deal with later. svn path=/trunk/boinc/; revision=21494	2010-05-12 21:07:39 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	e05a479f42	- scheduler and validator: distinguish between 1) peak FLOPS (based on benchmarks or GPU attributes). This does not change over time. It's not adjusted on the basis of statistics. It's not affected by wu.rsc_fpops_est. It can be compared across projects. versus 2) projected FLOPS: the scheduler's best guess as to what will satisfy X * elapsed_time = wu.rsc_fpops_est; this is used to make server-side runtime estimates, and it's sent to the client and used for its runtime estimates. It may be based on the (host, app version) elapsed time average. My checkin [21153] mistakently confounded these two. Notes: 1) app_plan() now must return both peak and projected FLOPS. 2) result.flops_estimate stores peak FLOPS 3) the <flops> field in app_info.xml files should be projected FLOPS. But its accuracy is not important; it's not used once the server has statistics for the (host, app version) svn path=/trunk/boinc/; revision=21164	2010-04-10 05:49:51 +00:00
David Anderson	1d765245ed	- scheduler: sweeping changes to the way job runtimes are estimated: see http://boinc.berkeley.edu/trac/wiki/RuntimeEstimation svn path=/trunk/boinc/; revision=21153	2010-04-08 23:14:47 +00:00
David Anderson	85e06afe4b	- scheduler: app_plan() no longer has to guess how efficiently an app version will run on a particular host. - scheduler: fix memory leak: BEST_APP_VERSIONs weren't being freed svn path=/trunk/boinc/; revision=21148	2010-04-08 18:27:27 +00:00
David Anderson	38bd1c8def	- validator: improved log messages - fix some compiler warnings svn path=/trunk/boinc/; revision=21053	2010-04-01 22:51:19 +00:00
David Anderson	737952dbb5	- server: client version numbers are represented as 10000major + 100minor + release, rather than 100*major + minor. Sometimes you need release-level resolution. This affects: - app_version.min_core_version - config: min_core_client_version_announced - config: min_core_client_version Projects using these must multiply them by 100. svn path=/trunk/boinc/; revision=20149	2010-01-13 17:28:59 +00:00
David Anderson	ee343cea02	- client: small tweak to work fetch: if project has crazy DCF, don't automatically request 1 sec; only request work if there's a shortfall. - intermediate checkin for notices stuff svn path=/trunk/boinc/; revision=20145	2010-01-12 21:53:40 +00:00
David Anderson	a151ad6cb3	- client/scheduler: deal with situation where GPU has enough RAM to run job, but when we actually run the job not enough GPU RAM is free, so the application fails. This can cause a large number of jobs to fail. Solution: - app_plan() can specify the GPU RAM requirements of an app version. This is passed to the client in a new field <gpu_ram> of the <app_version> element. - prior to starting or restarting a GPU app, the client checks the amount of free RAM on the particular GPU. If it's not enough for the app version, the client doesn't start it, and arranges for the scheduler to ignore it for 5 minutes (by which point there might be more free GPU RAM) Notes: 1) this change will have effect only when both client and scheduler are updated. 2) the check is done in enforce_schedule(), rather than schedule_cpus(), because only at that point have we assigned a specific GPU to the job. 3) there's another case to deal with: a GPU app's malloc of GPU RAM fails in the middle of the job. Currently the job fails. I plan to add an API call boinc_temporary_exit(x) so that the job can exit and potentially restart in x seconds. (In principle this mechanism is sufficient for all cases, but it could lead to a lot of starting/exiting, so the current change is worthwhile). svn path=/trunk/boinc/; revision=19864	2009-12-11 22:45:59 +00:00
David Anderson	71c7e7a74b	- client/scheduler/web: add per-project preferences for whether to accept CPU, NVIDIA and ATI jobs. These prefs are shown only where relevant: e.g., only for processor types for which the project has app versions, and if it has versions for only one type, no pref is shown. These prefs affect both client and scheduler. The client won't ask for work for a device blocked by prefs, and the scheduler won't send it. This replaces earlier optional project-specific prefs for "no CPU jobs" and "no GPU jobs". (However, these prefs continue to be honored on the server side). - client: if NVIDIA driver is unknown, say that rather than 0 svn path=/trunk/boinc/; revision=19194	2009-09-28 04:24:18 +00:00
David Anderson	a49ba8c2e9	- scheduler: if request is anon platform, write list of client's app versions to log svn path=/trunk/boinc/; revision=18923	2009-08-26 18:21:36 +00:00
David Anderson	eafb410cf8	- scheduler: simplify and fix the way that app_plan() conveys messages to the user. app_plan() now generates the messages directly rather than returning integer error codes. svn path=/trunk/boinc/; revision=18899	2009-08-21 20:38:39 +00:00
David Anderson	9e9f2a9878	- scheduler: code cleanup svn path=/trunk/boinc/; revision=18896	2009-08-21 19:14:15 +00:00
David Anderson	073e6ded2c	- client and scheduler: lay the groundwork for "fractional coproc jobs", e.g. the Milkyway@home ATI app, of which we can typically run 2 or 3 instances at once on a GPU. Changes include: - In APP_VERSION, don't use a COPROCS to represent the GPU requirements; just use doubles ncudas and natis. - sufficient_coprocs() etc. are no longer members of COPROCS - in HOST_USAGE, ncudas and natis are doubles - in scheduler request, req_instances is now a double This checkin doesn't include the job scheduling logic, i.e. assigning jobs to GPUs. That will follow. svn path=/trunk/boinc/; revision=18868	2009-08-19 18:41:47 +00:00
David Anderson	12d4b978be	- scheduler: if client request uses a weak authenticator, don't modify user preferences or CPID. - client: fix bug that shows ATI version incorrectly - database: host.posts has been repurposed as a salt (or seqno) for a new type of weak authenticator that won't depend on password - web code: modify forum_preferences.posts instead of host.posts. (actually, the former isn't used either, we just do a select count(*); should fix this at some point). svn path=/trunk/boinc/; revision=18865	2009-08-18 20:44:12 +00:00
David Anderson	7278ab1787	- scheduler: add support for ATI GPUs svn path=/trunk/boinc/; revision=18851	2009-08-17 17:07:38 +00:00
David Anderson	a525453b5e	- code shuffling svn path=/trunk/boinc/; revision=18826	2009-08-10 04:56:46 +00:00

40 Commits