boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	73b990b4b0	client: fix bug that sometimes prevented work fetch when GPU exclusions used	2013-06-16 20:10:17 -07:00
David Anderson	02fcc45ec4	client: fix work fetch bugs that caused incorrect GPU fetches	2013-06-10 10:36:05 -07:00
David Anderson	424b8c4034	client: fix work-fetch bug that can cause idle GPUs when use exclusions Round-robin simulation, among other things, creates a bitmap "sim_excluded_instances" of instances that are idle because of CPU exclusions. There was a problem in how this was computed; in the situation where there are fewer jobs than GPU instances it could fail to set any bits, so no work fetch would happen. My solution is a bit of a kludge, but should work in most cases. The long-term solution is to treat GPU instances separately, eliminating the need for GPU exclusions.	2013-06-08 16:25:53 -07:00
David Anderson	2e23bfedaa	- client, work fetch policy. Change policy for projects w/ GPU exclusions	2013-03-07 11:28:43 +01:00
David Anderson	a63ebbc13e	- client: change work fetch policy to work better with GPU exclusions - scale amount of work request by (# non-excluded instances)/#instances - change policy: old: don't fetch work if #jobs > #non-excluded instances new: don't fetch work if # of instance-seconds used in RR sim > work_buf_min * (#non-exluded instances)/#instances	2013-03-07 11:28:42 +01:00
David Anderson	3c73f40809	- client: the logic for work fetch in the presence of GPU exclusions (especially per-app exclusions) was incomplete and buggy. Changes: - make bitmaps of included instances per (app, resource type) - in round-robin simulation, we keep track of used instances (so that we know if there are instances that are idle because of exclusions). Do this based on app-level exclusions (previously it was done based on project-wide exclusions, which didn't include app-level exclusions). - compute RSC_PROJECT_WORK_FETCH::non_excluded_instances as the logical OR of the per-app masks. I.e. if you exclude an instance for all apps separately, it's the same as excluding it for the project as a whole. (Note: this bitmap is used for only 1 purpose: if we have idle instances, don't request work from a project for which those instances are excluded.) - define RSC_PROJECT_WORK_FETCH::ncoprocs_excluded as the # of instances excluded for any app, not the # excluded for all apps. This quantity is used in work fetch to make sure we don't unboundedly fetch jobs that turn out not to have a GPU to run on due to exclusions.	2013-03-05 13:42:00 +01:00
David Anderson	9cf10b400a	- GUI RPC: expose TIME_STATS info (e.g. on_frac) in the binding of the get_state() RPC - client: move client_start_time and previous_uptime from CLIENT_STATE to TIME_STATS, so that these are also visible in GUI RPC - scheduler RPC: move uptime and previous_uptime into <time_stats> - client: condition an RR simulation message on <rrsim_detail> - boinccmd: show TIME_STATS info in --get_state	2013-03-01 16:08:52 +01:00
David Anderson	777f1f11e8	- client: change work fetch policy to avoid starving GPUs in situations where GPU exclusions are used. - client: fix bug in round-robin simulation when GPU exclusions are used. Note: this fixes a major problem (starvation) with project-level GPU exclusion. However, project-level GPU exclusion interferes with most of the client's scheduling policies. E.g., round-robin simulation doesn't take GPU exclusion into account, and the resulting completion estimates and device shortfalls can be wrong by an order of magnitude. The only way I can see to fix this would be to model each GPU instance as a separate resource, and to associate each job with a particular GPU instance. This would be a sweeping change in both client and server.	2013-03-01 15:31:41 +01:00
David Anderson	4fea52c6f2	- client: if a project has excluded GPUs of a given type, allow it to fetch work of that type if the # of runnable jobs it <= the # of non-excluded instances (rather than 0). svn path=/trunk/boinc/; revision=26045	2012-08-18 23:26:10 +00:00
David Anderson	f8c1665722	- client: keep track of the fraction of time that 1) a network connection is available and 2) network communication is allowed and 3) CPU computation is allowed - If an app version is marked as needs_network, use the above fraction in estimating its rate of progress - replace "core client" with "client" in comments. - scheduler: message tweaks svn path=/trunk/boinc/; revision=25803	2012-06-26 20:30:56 +00:00
David Anderson	bbfbef0fe8	- client: code cleanup. Move RESULT and PROJECT to separate files svn path=/trunk/boinc/; revision=25621	2012-04-30 21:00:28 +00:00
David Anderson	9d25481174	- scheduler: fix bug that tried to open plan class spec file on each request. - client: when showing how much work a scheduler request returned, scale by availability (as is done to show the amount of the request) - client in account manager request, <not_started_dur> and <in_progress_dur> are in wall time, not run time (i.e. scale them by availability) Note: there's some confusion in the code between runtime and wall time, where in general wall time = runtime / availability. New convention: let's use "runtime" for the former, and "duration" for the latter. svn path=/trunk/boinc/; revision=25597	2012-04-25 04:10:29 +00:00
David Anderson	bc35060726	- client: when contacting a project for reasons other than work fetch (e.g. to report completed jobs) only request work if it's the project we would have chosen if we were fetching work. - client: the way in which project priorities were adjusted in work fetch to reflected currently queued work was wrong. - client: fix bug in the way project priorities are adjusted in RR simulator - client emulator: if there are results in the state file with states DOWNLOADING or UPLOADING, change them to DOWNLOADED or UPLOADED. Otherwise they're stuck. svn path=/trunk/boinc/; revision=24737	2011-12-06 04:21:27 +00:00
David Anderson	312c44415d	- client: condition RR sim negative FLOPs message on rr_simulation. svn path=/trunk/boinc/; revision=24540	2011-11-07 18:53:37 +00:00
David Anderson	98ba6807ab	svn path=/trunk/boinc/; revision=24537	2011-11-07 05:12:02 +00:00
David Anderson	7b28215032	- client: reimplement the round-robin simulator to reduce its runtime from O(N^2) to O(N), where N is the number of runnable jobs (which can be in the thousands). This will make the client emulator run a lot faster, and will reduce the client CPU overhead a bit. - API: change boinc_get_opencl_ids() so that it returns a BOINC error code (< -100) if the app_init.xml is missing or bad (i.e. we're running standalone), and an OpenCL error code (> -100) if an OpenCL call failed. svn path=/trunk/boinc/; revision=24469	2011-10-24 17:53:09 +00:00
David Anderson	b95ac02c5b	- client: change the way project priorities are computed, so that they do what they're supposed to (i.e. enforce resource shares) - client: change log flag <debt_debug> to <priority_debug> - client simulator: update REC even with large delta-t. - client simulator: handle "no new work" apps correctly svn path=/trunk/boinc/; revision=24429	2011-10-19 06:37:03 +00:00
David Anderson	5c0d5d371e	- client: compute project scheduling priority more efficiently - client: if an app version can't be used because the GPUs it needs are all excluded, mark it and all its results as "coproc missing" so that they won't be looked at in scheduling logic. svn path=/trunk/boinc/; revision=24317	2011-10-03 06:18:58 +00:00
David Anderson	b7f1aa0226	- client: fix a bug reported by Jacob Klein, where work fetch didn't work right in the presence of multiple GPUs and <exclude_gpu> config options. For example: suppose: - you have 2 GPUs and 2 projects - Project A is excluded from GPU 1 - you have lots of jobs for project A Then the client won't try to fetch jobs from project B. The problem had 2 parts: a) round-robin simulation wasn't taking GPU exclusions into account. In the above example, it would think that both GPUs had jobs. I fixed this by computing the # of GPUs from each project is excluded, and using this in the RR simulation. b) Once this was done, I needed to make the client request GPU jobs from project B rather than project A. I did this with following policy: If a project has excluded GPUs of a given type, and has a runnable job of that type, don't ask it for more work of that type. Notes: - the policy in b) is crude, and it means that work-buffer preferences are ignored in some cases. - neither a) nor b) takes into account app-level exclusions. I could fix both of these with a lot of work, but I'd rather move to a model in which dissimilar GPUs are modeled as different resources, which would remove the need for the <exclude_gpu> mechanism in the first place. - web: remove extraneous ) at end of button tooltips svn path=/trunk/boinc/; revision=24312	2011-10-01 16:23:28 +00:00
David Anderson	e279b59913	- Updates Linux notifications to use current libnotify. - Fix build problems on Mac OS X using autotools - Consistently use #if HAVE_X for platform checks, rather than #ifdef HAVE_X or #if defined(HAVE_X) - In Unix build, make lots of compiler checks standard - Fix some compile warnings From Matt Arsenault. Note: there are now lots of compile warnings in clientgui/ on Unix, mostly in WxWidgets code svn path=/trunk/boinc/; revision=24303	2011-09-27 19:45:27 +00:00
David Anderson	e0956b06df	- minor code shuffle svn path=/trunk/boinc/; revision=24222	2011-09-15 17:12:18 +00:00
David Anderson	be1d379f6a	- client: message tweak svn path=/trunk/boinc/; revision=24162	2011-09-12 17:22:36 +00:00
David Anderson	f81cb82b8e	- client: make RR simulation more accurate by simulating time-slicing explicitly. Also simulate changes in project REC and hence in scheduling priority. - client: add a log flag "rrsim_detail" that prints time-slice-level info. svn path=/trunk/boinc/; revision=24161	2011-09-12 17:01:54 +00:00
David Anderson	7b9e20ee78	- client: make round-robin simulator match what the job scheduler now does: give lowest priority to projects with zero resource share. svn path=/trunk/boinc/; revision=23963	2011-08-08 19:07:54 +00:00
David Anderson	a21abed078	- client: fix typo that caused a lot of spurious "project has XXXXXX deadline misses" messages - fix compile warnings svn path=/trunk/boinc/; revision=23816	2011-07-07 23:58:23 +00:00
David Anderson	94e8c48220	- client: change --detach_phase_two (??) to --detach_console - eliminate compiler warnings (e.g. shadowed vars) in various places, mostly in client svn path=/trunk/boinc/; revision=23710	2011-06-12 20:58:43 +00:00
David Anderson	3b906a191c	- client: generalize the GPU framework so that - new GPU types can be added easily - users can specify GPUs in cc_config.xml, referred to by app_info.xml, and they will be scheduled by BOINC and passed --device N options Note: the parsing of cc_config.xml is not done yet. - RPC protocols (account manager and scheduler) can now specify GPU types in separate elements rather than embedding them in tag names e.g. <no_rsc>NVIDIA</no_rsc> rather than <no_cuda/> - client: in account manager replies, parse elements of the form <no_rsc>NAME</no_rsc> indicating the GPUs of type NAME should not be used. This allows account managers to control GPU types not hardwired into the client. Note: <no_cuda/> and <no_ati/> will continue to be supported. - scheduler RPC reply: add <no_rsc_apps>NAME</no_rsc_apps> (NAME = GPU name) to indicate that the project has no jobs for the indicated GPU type. <no_cuda_apps> etc. are still supported - client/lib: remove set_debts() GUI RPC - client/scheduler RPC remove <cuda_backoff> etc. (superceded by no_app) Exception: <ip_result> elements in sched request still have <ncudas> and <natis>. Fix this later. Implementation notes: - client/lib: change "CUDA" to "NVIDIA" in type/variable names, and in XML Continue to recognize "CUDA" for compatibility - host_info.coprocs no longer used within the client; use a global var (COPROCS coprocs) instead. COPROCS now has an array of COPROCs; GPUs types are identified by the array index. Index zero means CPU. - a bunch of other resource-specific structs (like RSC_WORK_FETCH) are now stored in arrays, with same indices as COPROCS (i.e. index 0 is CPU) - COPROCS still has COPROC_NVIDIA and COPROC_ATI structs to hold vendor-specific info - APP_VERSION now has a struct GPU_USAGE to describe its GPU usage svn path=/trunk/boinc/; revision=23253	2011-03-25 03:44:09 +00:00
David Anderson	0685bd508e	- client: fix inaccuracy in RR simulation reported by Bill Barber. The problem arises when there are jobs of projects with widely differing resource shares, and results in an overestimation of saturated time. Old: at the start of simulation, call WORK_FETCH::compute_shares() to get resources of runnable projects. Use these throughout the simulation. Problem: suppose you have 2 runnable projects; P1 has large RS, P2 has small RS. P1's jobs finish quickly. P2's jobs then are running alone, but their FLOPS is scaled (incorrectly) by P2's small RS. Solution: recompute relative CPU resource share within the simulation loop, and compute it over the projects that have actives jobs in the simulation. svn path=/trunk/boinc/; revision=23162	2011-03-03 20:32:54 +00:00
David Anderson	795e89dbf5	- client: eliminate unnecessary CPU reschedules. Currently we do a reschedule any time a job checkpoints, in case there's a job that has finished a time slice but hasn't checkpointed yet. Instead: flag such jobs, and trigger a reschedule on checkpoint only for flagged jobs. - client: fix instability in job scheduling that happens if a job's estimated completion time in RR sim is close to its deadline. It can alternate between making and missing deadline, causing the scheduler to alternate rapidly between jobs. Solution: if RR sim has marked a job as deadline miss any time in the last (CPU scheduling period), treat it as a deadline miss. svn path=/trunk/boinc/; revision=22928	2011-01-19 16:46:55 +00:00
David Anderson	717c45a2db	- client: use std::deque instead of std::vector for RR sim's pending-job lists. Erasing head of vector is slow. - lib: allow GPU peak FLOPS to be specified in XML (for simulator) - simulator work - client: old work fetch policy: projects may need enough jobs for all device instances, not just resource_share*ninst. E.g. a project that has only CPU jobs in a CPU/GPU client - client: with REC scheduling, don't ask for work for secondary resources if project has negative priority. - client: in RR sim, make sure we saturate devices if possible. Otherwise we may report a shortfall incorrectly svn path=/trunk/boinc/; revision=22894	2011-01-12 00:47:51 +00:00
David Anderson	5c2636b743	- client: fix scheduling bug. The round-robin simulation wasn't handling multithread jobs correctly. For example, given two 3-CPU jobs, it would model running them together on a 4-CPU host. This doesn't correspond with the CPU scheduler, which runs only 1 at a time. So the simulator would say that there are no idle CPUs when in fact there are, and no new CPU jobs would be fetched. svn path=/trunk/boinc/; revision=22801	2010-12-02 17:26:03 +00:00
David Anderson	8d9cf013c5	- client: account manager RPC: Additions to request message: <not_started_dur>X</not_started_dur> <in_progress_dur>X</in_progress_dur> The estimated remaining duration of unstarted and in-progress tasks Additions to reply message, within <project>, optional: <suspend>0\|1</suspend> suspend or resume project (overrides local state) <abort_not_started>0\|1</abort_not_started> if set, abort unstarted jobs svn path=/trunk/boinc/; revision=22698	2010-11-17 20:04:58 +00:00
David Anderson	de944b928e	- admin web: fix bugs in manage_app_versions page - client: message tweak svn path=/trunk/boinc/; revision=22633	2010-11-05 23:23:28 +00:00
David Anderson	fdf15fb3af	- client: maintain "gpu_active_frac" in addition to "active_frac" (which really means CPU active) svn path=/trunk/boinc/; revision=22283	2010-08-23 05:00:22 +00:00
David Anderson	2b33429f18	- scheduler: fix bug in single-replication decision (from Rytis) svn path=/trunk/boinc/; revision=21576	2010-05-18 22:32:05 +00:00
David Anderson	40eebe00af	- client/scheduler: in COPROCS, instead of having a vector of pointers to dynamically allocated COPROC-derived objects, just have the objects themselves. Dynamic allocation should be avoided at all costs. svn path=/trunk/boinc/; revision=21564	2010-05-18 19:22:34 +00:00
Rom Walton	9cb3e6ffc7	- client & lib: bring header inclusion up-to-date for the CC to begin hunting down a memory leak. client/ <Various Files> lib/ <Various Files> svn path=/trunk/boinc/; revision=21457	2010-05-11 19:10:29 +00:00
David Anderson	7db608660f	- client: standardize debug messages. Messages enabled by <foo_debug> are prefixed by "[foo]" svn path=/trunk/boinc/; revision=21335	2010-04-29 20:32:51 +00:00
David Anderson	b7d48765a8	- client: if have coproc jobs but coproc is missing, skip those jobs in RR sim. Otherwise we add stuff to uninitialized data structures, and a crash can result. - client: initialize the above data structures anyway svn path=/trunk/boinc/; revision=20753	2010-02-28 04:32:10 +00:00
David Anderson	f716dcf7ae	- client: if a project has zero resource share, treat it as a "backup project": fetch work from it only if there is an idle instance and no other projects have work. svn path=/trunk/boinc/; revision=20286	2010-01-28 05:21:14 +00:00
David Anderson	b5124fe729	- client: brute-force attempt at eliminating domino-effect preemption: if job A is unstarted and EDF, and there's a job B that is later in the list, is started, has the same app version, and has the same arrival time, move A after B. - client: remove the "temp_dcf" mechanism, which had the same goal but didn't work. - client: in computing overall debt for a project, subtract a term that reflects pending work. This should reduce repeated fetches from the same project. - client simulator: tweaks svn path=/trunk/boinc/; revision=20223	2010-01-21 00:14:56 +00:00
David Anderson	fe7d8b34f3	- client simulator: done for now svn path=/trunk/boinc/; revision=20204	2010-01-20 06:35:57 +00:00
David Anderson	d6b6f8d5db	- client (Mac): append /usr/local/cuda/lib to LD_LIBRARY_PATH and DYLD_LIBRARY_PATH - client simulator: compile fixes svn path=/trunk/boinc/; revision=20117	2010-01-09 16:41:17 +00:00
David Anderson	37aae854f3	- client: scheduling problem: - a project overestimates job FLOP counts - the client starts jobs in EDF mode - as job progresses and fraction done increases, its completion time estimate decreases until it's no longer a deadline miss. - job gets preempted by other job from that project; you end up with lots of partly completed jobs. Solution (I hope): if an app version has running jobs, compute a "temp DCF" for the app version, which is the min of dynamic/static estimates for its jobs. Apply this scaling factor to completion time estimates for unstarted jobs in RR simulation - client: the estimation of remaining time of running jobs was wrong (how did this bug survive so long?) svn path=/trunk/boinc/; revision=20077	2010-01-06 06:01:23 +00:00
David Anderson	876522c6aa	- client: add logic to work fetch so that each project will have enough jobs to use its share of resource instances. This avoids situations where e.g. on a 2-CPU system a project has 75% resource share and 1 CPU job, and its STD increases without bound. Did a general cleanup of the logic for computing work request sizes (seconds and instances). svn path=/trunk/boinc/; revision=20036	2009-12-24 20:40:27 +00:00
David Anderson	e9a4debf9c	- client: scheduling tweak. Old: if a project has RR sim deadline misses, select jobs to run high-priority on the basis of: 1) deadline (earliest first) 2) estimated time to completion (least first) This ignores whether jobs missed their deadline in RR sim, so it may choose to run a job that's actually in no danger of missing its deadline over one that is. New: choose only jobs that miss their deadline in RR sim svn path=/trunk/boinc/; revision=19826	2009-12-08 20:39:46 +00:00
David Anderson	4d96415576	- client: fix bug introduced in [19035] that causes wrong nidle instances (and resulting work fetch problems) - Unix build: don't touch svn_version.sh if it hasn't changed, to avoid remake of sched/ (from Gabor Gombas) svn path=/trunk/boinc/; revision=19096	2009-09-18 19:26:34 +00:00
David Anderson	f5a6f862bf	- client: fix bug in RR simulation: start only enough jobs to fill CPUs per project, not all the CPU jobs at once. I'm not sure how much difference this makes, but this is how it's supposed to work. - client: if app_info.xml doesn't specify flops, use an estimate that takes GPUs into account. - client: if it's been more than 2 weeks since time stats update, don't decay on_frac at all. svn path=/trunk/boinc/; revision=19035	2009-09-09 22:18:02 +00:00
David Anderson	c3fe504e1d	- client: add ATI support to job scheduling and work fetch svn path=/trunk/boinc/; revision=18850	2009-08-17 16:50:40 +00:00
David Anderson	0a523d5f3f	svn path=/trunk/boinc/; revision=18843	2009-08-14 17:10:52 +00:00

1 2

92 Commits