boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	795e89dbf5	- client: eliminate unnecessary CPU reschedules. Currently we do a reschedule any time a job checkpoints, in case there's a job that has finished a time slice but hasn't checkpointed yet. Instead: flag such jobs, and trigger a reschedule on checkpoint only for flagged jobs. - client: fix instability in job scheduling that happens if a job's estimated completion time in RR sim is close to its deadline. It can alternate between making and missing deadline, causing the scheduler to alternate rapidly between jobs. Solution: if RR sim has marked a job as deadline miss any time in the last (CPU scheduling period), treat it as a deadline miss. svn path=/trunk/boinc/; revision=22928	2011-01-19 16:46:55 +00:00
David Anderson	717c45a2db	- client: use std::deque instead of std::vector for RR sim's pending-job lists. Erasing head of vector is slow. - lib: allow GPU peak FLOPS to be specified in XML (for simulator) - simulator work - client: old work fetch policy: projects may need enough jobs for all device instances, not just resource_share*ninst. E.g. a project that has only CPU jobs in a CPU/GPU client - client: with REC scheduling, don't ask for work for secondary resources if project has negative priority. - client: in RR sim, make sure we saturate devices if possible. Otherwise we may report a shortfall incorrectly svn path=/trunk/boinc/; revision=22894	2011-01-12 00:47:51 +00:00
David Anderson	5c2636b743	- client: fix scheduling bug. The round-robin simulation wasn't handling multithread jobs correctly. For example, given two 3-CPU jobs, it would model running them together on a 4-CPU host. This doesn't correspond with the CPU scheduler, which runs only 1 at a time. So the simulator would say that there are no idle CPUs when in fact there are, and no new CPU jobs would be fetched. svn path=/trunk/boinc/; revision=22801	2010-12-02 17:26:03 +00:00
David Anderson	8d9cf013c5	- client: account manager RPC: Additions to request message: <not_started_dur>X</not_started_dur> <in_progress_dur>X</in_progress_dur> The estimated remaining duration of unstarted and in-progress tasks Additions to reply message, within <project>, optional: <suspend>0\|1</suspend> suspend or resume project (overrides local state) <abort_not_started>0\|1</abort_not_started> if set, abort unstarted jobs svn path=/trunk/boinc/; revision=22698	2010-11-17 20:04:58 +00:00
David Anderson	de944b928e	- admin web: fix bugs in manage_app_versions page - client: message tweak svn path=/trunk/boinc/; revision=22633	2010-11-05 23:23:28 +00:00
David Anderson	fdf15fb3af	- client: maintain "gpu_active_frac" in addition to "active_frac" (which really means CPU active) svn path=/trunk/boinc/; revision=22283	2010-08-23 05:00:22 +00:00
David Anderson	2b33429f18	- scheduler: fix bug in single-replication decision (from Rytis) svn path=/trunk/boinc/; revision=21576	2010-05-18 22:32:05 +00:00
David Anderson	40eebe00af	- client/scheduler: in COPROCS, instead of having a vector of pointers to dynamically allocated COPROC-derived objects, just have the objects themselves. Dynamic allocation should be avoided at all costs. svn path=/trunk/boinc/; revision=21564	2010-05-18 19:22:34 +00:00
Rom Walton	9cb3e6ffc7	- client & lib: bring header inclusion up-to-date for the CC to begin hunting down a memory leak. client/ <Various Files> lib/ <Various Files> svn path=/trunk/boinc/; revision=21457	2010-05-11 19:10:29 +00:00
David Anderson	7db608660f	- client: standardize debug messages. Messages enabled by <foo_debug> are prefixed by "[foo]" svn path=/trunk/boinc/; revision=21335	2010-04-29 20:32:51 +00:00
David Anderson	b7d48765a8	- client: if have coproc jobs but coproc is missing, skip those jobs in RR sim. Otherwise we add stuff to uninitialized data structures, and a crash can result. - client: initialize the above data structures anyway svn path=/trunk/boinc/; revision=20753	2010-02-28 04:32:10 +00:00
David Anderson	f716dcf7ae	- client: if a project has zero resource share, treat it as a "backup project": fetch work from it only if there is an idle instance and no other projects have work. svn path=/trunk/boinc/; revision=20286	2010-01-28 05:21:14 +00:00
David Anderson	b5124fe729	- client: brute-force attempt at eliminating domino-effect preemption: if job A is unstarted and EDF, and there's a job B that is later in the list, is started, has the same app version, and has the same arrival time, move A after B. - client: remove the "temp_dcf" mechanism, which had the same goal but didn't work. - client: in computing overall debt for a project, subtract a term that reflects pending work. This should reduce repeated fetches from the same project. - client simulator: tweaks svn path=/trunk/boinc/; revision=20223	2010-01-21 00:14:56 +00:00
David Anderson	fe7d8b34f3	- client simulator: done for now svn path=/trunk/boinc/; revision=20204	2010-01-20 06:35:57 +00:00
David Anderson	d6b6f8d5db	- client (Mac): append /usr/local/cuda/lib to LD_LIBRARY_PATH and DYLD_LIBRARY_PATH - client simulator: compile fixes svn path=/trunk/boinc/; revision=20117	2010-01-09 16:41:17 +00:00
David Anderson	37aae854f3	- client: scheduling problem: - a project overestimates job FLOP counts - the client starts jobs in EDF mode - as job progresses and fraction done increases, its completion time estimate decreases until it's no longer a deadline miss. - job gets preempted by other job from that project; you end up with lots of partly completed jobs. Solution (I hope): if an app version has running jobs, compute a "temp DCF" for the app version, which is the min of dynamic/static estimates for its jobs. Apply this scaling factor to completion time estimates for unstarted jobs in RR simulation - client: the estimation of remaining time of running jobs was wrong (how did this bug survive so long?) svn path=/trunk/boinc/; revision=20077	2010-01-06 06:01:23 +00:00
David Anderson	876522c6aa	- client: add logic to work fetch so that each project will have enough jobs to use its share of resource instances. This avoids situations where e.g. on a 2-CPU system a project has 75% resource share and 1 CPU job, and its STD increases without bound. Did a general cleanup of the logic for computing work request sizes (seconds and instances). svn path=/trunk/boinc/; revision=20036	2009-12-24 20:40:27 +00:00
David Anderson	e9a4debf9c	- client: scheduling tweak. Old: if a project has RR sim deadline misses, select jobs to run high-priority on the basis of: 1) deadline (earliest first) 2) estimated time to completion (least first) This ignores whether jobs missed their deadline in RR sim, so it may choose to run a job that's actually in no danger of missing its deadline over one that is. New: choose only jobs that miss their deadline in RR sim svn path=/trunk/boinc/; revision=19826	2009-12-08 20:39:46 +00:00
David Anderson	4d96415576	- client: fix bug introduced in [19035] that causes wrong nidle instances (and resulting work fetch problems) - Unix build: don't touch svn_version.sh if it hasn't changed, to avoid remake of sched/ (from Gabor Gombas) svn path=/trunk/boinc/; revision=19096	2009-09-18 19:26:34 +00:00
David Anderson	f5a6f862bf	- client: fix bug in RR simulation: start only enough jobs to fill CPUs per project, not all the CPU jobs at once. I'm not sure how much difference this makes, but this is how it's supposed to work. - client: if app_info.xml doesn't specify flops, use an estimate that takes GPUs into account. - client: if it's been more than 2 weeks since time stats update, don't decay on_frac at all. svn path=/trunk/boinc/; revision=19035	2009-09-09 22:18:02 +00:00
David Anderson	c3fe504e1d	- client: add ATI support to job scheduling and work fetch svn path=/trunk/boinc/; revision=18850	2009-08-17 16:50:40 +00:00
David Anderson	0a523d5f3f	svn path=/trunk/boinc/; revision=18843	2009-08-14 17:10:52 +00:00
David Anderson	e606170b14	- client: try to fix situations where the scheduler runs GPU jobs in a seemingly random order, or preempts GPU jobs needlessly. The change has two parts: 1) sort the "results" vector by received_time, so that the RR simulation processes GPU jobs FIFO. 2) in the CPU scheduler (earliest_deadline_result()) instead of choosing the earliest-deadline GPU job that misses its deadline, pick the earliest_deadline GPU from a project that has a deadline miss for that GPU type (this is what's done in the CPU case) - client: fix bug where if you have an exclusive app, then remove it from cc_config.xml and do "update config", it doesn't go away. Need to clear the list before parsing. svn path=/trunk/boinc/; revision=18842	2009-08-14 16:54:45 +00:00
David Anderson	b358089006	svn path=/trunk/boinc/; revision=18632	2009-07-20 17:30:10 +00:00
David Anderson	5753153909	- client: 2nd try on my last checkin. We need to estimate 2 different delays for each resource type: 1) "saturated time": the time the resource will be fully utilized (new name for the old "estimated delay"). This is used to compute work requests. 2) "busy time": the time a new job would have to wait to start using this resource. This is passed to the scheduler and used for a crude deadline check. Note: this is ill-defined; a single number doesn't suffice. But as a very rough estimate, I'll use the sum of (J.duration * J.ninstances)/ninstances over all jobs that miss their deadline under RR sim. svn path=/trunk/boinc/; revision=18629	2009-07-17 18:29:10 +00:00
David Anderson	8a1c0816ed	- client: change the way a resource's "estimated delay" (passed to server for crude deadline check) is computed. Old: estimated delay is the interval for which the resource is fully used (i.e., all instances busy). Problem: this may cause unnecessary project starvation. example: 1 CPU machine, has a month-long CPDN job with a 1-year deadline (it's not in deadline trouble). Then the CPU estimated delay will be 1 month, and the client won't get any work from projects with deadlines shorter than 1 month. New: estimated delay is the latest time at which the resource is fully used and is being used by at least 1 job that is projected to miss its deadline under RR. Note: this isn't precise, but I don't think we can improve it much without getting a lot more complex. svn path=/trunk/boinc/; revision=18607	2009-07-16 21:21:47 +00:00
David Anderson	c2097091fe	- client: show "est. delay" correctly in work fetch debug msgs - client: show times correctly in rr_sim debug msgs - client: in "requesting new tasks" msg, say what resources we're requesting (if there's more than CPU) - client: estimated delay was possibly being calculated incorrectly because of roundoff error svn path=/trunk/boinc/; revision=18269	2009-06-02 22:53:57 +00:00
David Anderson	cf638ae3a6	- client: instead of scheduling coproc jobs EDF: - first schedule jobs projected to miss deadline in EDF order - then schedule remaining jobs in FIFO order This is intended to reduce the number of preemptions of coproc jobs, and hence (since they are always preempted by quit) to reduce the wasted time due to checkpoint gaps. - client: the CPU scheduling policy made use of the number of deadline misses in various places. This should include only the deadline misses of CPU jobs. So move "deadlines_missed" from RR_SIM_STATUS and PROJECT to RSC_PROJECT_WORK_FETCH so that we have separate counts for CPU and coproc jobs, and use the count for CPU jobs. - GUI RPC: removed the rr_sim_deadlines_missed field from project descriptor. This is no longer meaningful, and it didn't seem to be used anywhere. svn path=/trunk/boinc/; revision=17785	2009-04-10 19:01:38 +00:00
David Anderson	7e256c0995	- client: work fetch: in RR sim, keep track of the number of device instances used by jobs that miss deadline. Don't do "variety" work fetch if this is >= # of instances svn path=/trunk/boinc/; revision=17631	2009-03-19 16:55:04 +00:00
David Anderson	edca22818e	- client: in RR simulation, use app_version.flops instead of host_info.fpops as the FLOPS estimate for non-GPU apps. I don't see why this would make any difference (these two are equal for non-GPU apps) but people have reported that this change improves estimates. svn path=/trunk/boinc/; revision=17624	2009-03-18 17:24:56 +00:00
David Anderson	fb1187e398	svn path=/trunk/boinc/; revision=17501	2009-03-04 22:07:16 +00:00
David Anderson	346ac348b3	- client: RR sim FLOPS estimate for GPU jobs should reflect fraction of time BOINC is running. svn path=/trunk/boinc/; revision=17412	2009-02-27 21:44:39 +00:00
David Anderson	125c90d1da	- client: work-fetch bug fix: if we're fetching work for a starved project, it most have no runnable jobs for ANY resource. - client: work-fetch bug fix: when setting requests in the shortfall case, don't request anything if project is backed off or overworked for the resource. svn path=/trunk/boinc/; revision=17338	2009-02-23 21:34:13 +00:00
David Anderson	6a75b78de4	- client: don't ignore jobs with fraction_done=1 (but still running) in RR simulation; we may need to mark them as deadline miss. - web: replace & with & various places svn path=/trunk/boinc/; revision=17278	2009-02-17 17:39:57 +00:00
David Anderson	a4a2a68f7d	- fix tabs svn path=/trunk/boinc/; revision=17101	2009-02-02 18:47:34 +00:00
David Anderson	9f170696a4	- client: code cleanup svn path=/trunk/boinc/; revision=17100	2009-02-02 18:45:00 +00:00
David Anderson	6120b02306	- client: code cleanup svn path=/trunk/boinc/; revision=17098	2009-02-02 05:15:12 +00:00
David Anderson	89188fca84	- client: there was a problem with how the round simulator worked in the presence of coprocessors. The simulator maintained per-project queues of pending jobs. When a job finished (in the simulation) it would get one or more jobs from that project's pending queue. The problem: this could cause "holes" in the scheduling of GPUs, and produce an erroneous nonzero shortfall for GPUs, leading to infinite work fetch. The solution: maintain a separate (per-resource, not per--project) queue of pending coprocessor jobs. When a coprocessor job finishes, start pending jobs from the queue for that resource. Another change: the simulator did strict reservation of coprocessors. If there are 2 instances of CUDA, and a 1-instance job is running in the simulation, it wouldn't start an additional 2-instance job. This also can cause erroneous nonzero shortfalls. So instead, schedule coprocessors like CPUs, i.e. saturate them. This can cause distorted completion time estimates, but it's better than infinite work fetch. svn path=/trunk/boinc/; revision=17093	2009-02-01 04:37:19 +00:00
David Anderson	9e7cb42084	- client: computation of # idle CUDA instances was wrong svn path=/trunk/boinc/; revision=17087	2009-01-30 21:49:20 +00:00
David Anderson	b7a2c227ca	- Work fetch / scheduler: There are two mechanisms to prevent the scheduler from sending jobs that won't finish by their deadline. Simple mechanism: The client sends the interval x for which CPUs are projected to be saturated. Given a job with estimated duration y, the scheduler doesn't send it if x + y exceeds the delay bound. If it does send it, x is incremented by y. Complex mechanism: Client sends workload description. Scheduler does EDF simulation, sees if deadlines are missed. The only project using this AFAIK is BOINC alpha test. Neither of these mechanisms takes coprocessors into account, and as a result jobs could be sent that are doomed to miss their deadline. This checkin adds coprocessor awareness to the Simple mechanism. Changes: Client: compute estimated delay (i.e. time until non-saturation) for coprocessors as well as CPU. Send them in scheduler request as part of coproc descriptor. Scheduler: Keep track of estimated delays separately for different resources - client: fixed bug that computed CPU estimated delay incorrectly - client: the work request (req_secs) for a resource is the min of the project's share and the shortfall. svn path=/trunk/boinc/; revision=17086	2009-01-30 21:25:24 +00:00
David Anderson	f33631cbbc	- client: fix messages svn path=/trunk/boinc/; revision=16960	2009-01-20 18:06:49 +00:00
David Anderson	f90dddc9a6	- client: clamp long term debts tp +- 1 week - client: fix CUDA debt calculation - client: don't accumulate debt if project->dont_request_more_work - client: improves messages svn path=/trunk/boinc/; revision=16909	2009-01-14 23:56:07 +00:00
David Anderson	132cc6bba3	- client: debugging CUDA-related stuff - client: if reset a project, clear its overall and per-resource backoffs svn path=/trunk/boinc/; revision=16862	2009-01-10 00:48:22 +00:00
David Anderson	2860574fa5	compile fixes and debug message fixes svn path=/trunk/boinc/; revision=16836	2009-01-08 00:20:04 +00:00
David Anderson	8740ffdc94	- client: more work-fetch stuff. No more per-project shortfall. It's getting pretty close. svn path=/trunk/boinc/; revision=16765	2009-01-03 06:01:17 +00:00
David Anderson	72937e5c4f	win compile fixes svn path=/trunk/boinc/; revision=16756	2008-12-31 23:30:38 +00:00
David Anderson	8c591e31df	- client: first whack at new work-fetch logic. Very preliminary. svn path=/trunk/boinc/; revision=16754	2008-12-31 23:07:59 +00:00
David Anderson	cd4ca5fb17	- client: fix calculation of a job's FLOPS rate in round-robin simulation svn path=/trunk/boinc/; revision=16662	2008-12-09 20:01:01 +00:00
David Anderson	fbb899f1c0	- client: in round-robin simulation, don't count a project in total resource share if it has coproc jobs and no CPU jobs. svn path=/trunk/boinc/; revision=16652	2008-12-08 23:00:23 +00:00
David Anderson	af183bc2db	- client: in round-robin simulation, remove code that sets CPU shortfall for projects with no active results. This is now wrong because there coproc apps might have pending results. Also remove nidle_cpus > 0 conditional that increments CPU shortfall; I think this is vestigial code. svn path=/trunk/boinc/; revision=16646	2008-12-08 18:26:25 +00:00

1 2

64 Commits