boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	a151ad6cb3	- client/scheduler: deal with situation where GPU has enough RAM to run job, but when we actually run the job not enough GPU RAM is free, so the application fails. This can cause a large number of jobs to fail. Solution: - app_plan() can specify the GPU RAM requirements of an app version. This is passed to the client in a new field <gpu_ram> of the <app_version> element. - prior to starting or restarting a GPU app, the client checks the amount of free RAM on the particular GPU. If it's not enough for the app version, the client doesn't start it, and arranges for the scheduler to ignore it for 5 minutes (by which point there might be more free GPU RAM) Notes: 1) this change will have effect only when both client and scheduler are updated. 2) the check is done in enforce_schedule(), rather than schedule_cpus(), because only at that point have we assigned a specific GPU to the job. 3) there's another case to deal with: a GPU app's malloc of GPU RAM fails in the middle of the job. Currently the job fails. I plan to add an API call boinc_temporary_exit(x) so that the job can exit and potentially restart in x seconds. (In principle this mechanism is sufficient for all cases, but it could lead to a lot of starting/exiting, so the current change is worthwhile). svn path=/trunk/boinc/; revision=19864	2009-12-11 22:45:59 +00:00
David Anderson	e27659858d	- result of code shuffle: the HOST_INFO structure returned by the get_host_info() GUI RPC now contains GPU info svn path=/trunk/boinc/; revision=19798	2009-12-07 06:13:17 +00:00
David Anderson	b70229c093	- code shuffle: move client-specific GPU code to a separate file svn path=/trunk/boinc/; revision=19794	2009-12-07 00:42:03 +00:00
David Anderson	4bf2ef5198	- client: add new config options: <ignore_cuda_dev>n</ignore_cuda_dev> <ignore_ati_dev>n</ignore_ati_dev> to ignore (not use) specific NVIDIA or ATI GPUs. You can ignore more than one. svn path=/trunk/boinc/; revision=19566	2009-11-12 23:44:49 +00:00
David Anderson	fe2a18f282	- client/scheduler: standardize the FLOPS estimate between NVIDIA and ATI. Make them both peak FLOPS, according to the formula supplied by the manufacturer. The impact on the client is minor: - the startup message describing the GPU - the weight of the resource type in computing long-term debt On the server, I changed the example app_plan() function to assume that app FLOPS is 20% of peak FLOPS (that's about what it is for SETI@home) svn path=/trunk/boinc/; revision=19310	2009-10-16 00:13:01 +00:00
David Anderson	d6efa7dabb	- client: address the situation where GPUs become unusable for certain periods (e.g. when Remote Desktop is used on Win). - add is_usable() member function to COPROC. Currently this just calls the respective (CUDA or CAL) initialization function. We need to check whether this works and/or causes problems. - in enforce_schedule(), check whether usability has changed for each GPU type. If we've gone from usable to unusable, flag all jobs for that GPU as coproc_missing (so they won't get run, and will quit if they're running). If we've gone from unusable to usable, clear the flag. This should deal with all cases except where the client is started up with GPUs unusable. - scheduler: more query optimizations for locality scheduling (from Oliver Bock) svn path=/trunk/boinc/; revision=19301	2009-10-14 18:07:49 +00:00
David Anderson	fca2cb8016	- client: restore calDeviceGetInfo(), add its info to COPROC_ATI struct (some plan class might need to know this). Code cleanup. svn path=/trunk/boinc/; revision=19234	2009-10-02 22:58:03 +00:00
Rom Walton	ad455ab09d	- client: Add support for checking for both amd* prefixed CAL libraries and ati* prefixed CAL libraries. - scheduler: redefine ati class plans again. ati: CAL 1.0+, amd* prefixed libraries ati13amd: CAL 1.3+, amd* prefixed libraries ati13ati: CAL 1.3+, ati* prefixed libraries ati14: CAL 1.4+, ati* prefixed libraries sched/ sched_customize.cpp lib/ coproc.cpp, .h svn path=/trunk/boinc/; revision=19162	2009-09-25 15:40:16 +00:00
David Anderson	39815033a3	- client: in GPU enumeration, separate warning msgs from GPU descriptions. Show warning msgs only if log_flags.coproc_debug svn path=/trunk/boinc/; revision=19153	2009-09-24 17:23:33 +00:00
David Anderson	f5a6f862bf	- client: fix bug in RR simulation: start only enough jobs to fill CPUs per project, not all the CPU jobs at once. I'm not sure how much difference this makes, but this is how it's supposed to work. - client: if app_info.xml doesn't specify flops, use an estimate that takes GPUs into account. - client: if it's been more than 2 weeks since time stats update, don't decay on_frac at all. svn path=/trunk/boinc/; revision=19035	2009-09-09 22:18:02 +00:00
David Anderson	b129e71f20	- client: add code for faking ATI GPUs svn path=/trunk/boinc/; revision=19024	2009-09-08 18:42:24 +00:00
David Anderson	2039e67638	- client: NVIDIA offers an API which tells you whether a GPU is running a graphics application. Change the semantics of the "don't use GPU while computer in use" pref to "don't use a GPU that's running a graphics app while computer is in use". This will increase GPU utilization on multi-GPU systems. svn path=/trunk/boinc/; revision=18942	2009-08-28 22:55:04 +00:00
David Anderson	9a8f91fb1e	- client: in parsing <coproc> elements in <app_version>, use a new type COPROC_REQ for which the count field is a double. Otherwise fractional GPU jobs don't work. svn path=/trunk/boinc/; revision=18906	2009-08-24 23:16:17 +00:00
David Anderson	f8977c52e7	- fixes to coproc stuff svn path=/trunk/boinc/; revision=18881	2009-08-19 23:47:07 +00:00
David Anderson	f1360e5971	- client: finish the implementation of fractional coproc jobs. - different data structure for keeping track of coproc usage; instead of COPROC having per-instance pointers to ACTIVE_TASK, ACTIVE_TASK now has an array of device number indices for each instance that it's using. - in enforce_schedule(), we call a new function assign_coprocs() that decides what coproc instances each job will use, and prunes jobs for which we can't get an assignment. This function embodies lots of subtlety. - coproc_cmdline() no longer deals with reserving instances; it just has to generate the --device X cmdline svn path=/trunk/boinc/; revision=18880	2009-08-19 23:21:55 +00:00
David Anderson	091dba7a65	svn path=/trunk/boinc/; revision=18874	2009-08-19 20:33:04 +00:00
David Anderson	073e6ded2c	- client and scheduler: lay the groundwork for "fractional coproc jobs", e.g. the Milkyway@home ATI app, of which we can typically run 2 or 3 instances at once on a GPU. Changes include: - In APP_VERSION, don't use a COPROCS to represent the GPU requirements; just use doubles ncudas and natis. - sufficient_coprocs() etc. are no longer members of COPROCS - in HOST_USAGE, ncudas and natis are doubles - in scheduler request, req_instances is now a double This checkin doesn't include the job scheduling logic, i.e. assigning jobs to GPUs. That will follow. svn path=/trunk/boinc/; revision=18868	2009-08-19 18:41:47 +00:00
David Anderson	152ee20b17	- client: fix calculation of ATI flops svn path=/trunk/boinc/; revision=18852	2009-08-17 17:27:06 +00:00
David Anderson	c3fe504e1d	- client: add ATI support to job scheduling and work fetch svn path=/trunk/boinc/; revision=18850	2009-08-17 16:50:40 +00:00
David Anderson	8df1e1ebb3	- client: ATI tweaks svn path=/trunk/boinc/; revision=18849	2009-08-16 04:02:11 +00:00
David Anderson	4eb7097653	compile fixes svn path=/trunk/boinc/; revision=18848	2009-08-15 00:12:51 +00:00
David Anderson	3b03707efa	- client: clean up ATI code and make it work (or at least compile) under Linux svn path=/trunk/boinc/; revision=18847	2009-08-15 00:00:57 +00:00
David Anderson	602ad0b5b7	- client: ATI GPU detection code (from Crunch3r) svn path=/trunk/boinc/; revision=18846	2009-08-14 22:54:34 +00:00
David Anderson	94e75fd4b1	svn path=/trunk/boinc/; revision=18770	2009-07-29 21:21:52 +00:00
David Anderson	e3a730c334	- client: add <use_all_gpus> config option. If set, use GPUs even if they're not equivalent to the most capable one. - Validator: fix one_pass_N_WU option. svn path=/trunk/boinc/; revision=17896	2009-04-27 23:51:46 +00:00
David Anderson	5adb25381d	- client: new approach to handling multiple GPUs. old: find fastest GPU, and pretend that others are the same. Problem: other GPUs might be less capable, and not able to handle jobs sent by server. new: find the most "capable" GPU, use others that are equivalent, don't use those that are not. "Capable" is defined by - compute capability (i.e., hardware version) - driver version - memory size - FLOPs in that priority order. See comments in lib/coproc.h svn path=/trunk/boinc/; revision=17855	2009-04-22 02:09:53 +00:00
David Anderson	90f863f08c	- partial checkin so I can edit locally (bad network connection) svn path=/trunk/boinc/; revision=17852	2009-04-21 08:11:28 +00:00
David Anderson	c58136e5bf	- client: improve CPU sched debug messages (say what kind of job and why we're scheduling it) - client: log messages describing GPUs: one line per GPU; fixes #879 svn path=/trunk/boinc/; revision=17847	2009-04-20 00:00:11 +00:00
David Anderson	cd4786166a	- client: fix crash svn path=/trunk/boinc/; revision=17550	2009-03-06 23:27:19 +00:00
David Anderson	e1b94a1e53	- client: add a new mechanism for assigning coproc instances to tasks, and passing them the corresponding --device N cmdline args. This fixes a bug introduced in 17402 (Feb 26) that broke the --device feature, presumably causing problems on systems with multiple GPUs. svn path=/trunk/boinc/; revision=17549	2009-03-06 23:10:45 +00:00
David Anderson	c22b62f25b	- scheduler: fix bugs in support for anonymous platform + coprocs (app versions don't have a <coprocs> around coproc elements, may an oversight but let's stick with it). Anyway, I think it's working now. - lib: remove "owner" array from COPROC. This was used in client to keep track of assignment of coprocessors to tasks, but we got rid of the reserve/free scheme. NOTE: this breaks the mechanism for passing --device N to apps; I'll have to do this another way. Stay tuned. svn path=/trunk/boinc/; revision=17543	2009-03-06 22:21:47 +00:00
David Anderson	16ca7cd359	svn path=/trunk/boinc/; revision=17332	2009-02-22 04:05:34 +00:00
David Anderson	4d1544e579	- client: detect NVIDIA driver version number, show it on startup, and include it with CUDA coprocessor descriptor in request msgs svn path=/trunk/boinc/; revision=17275	2009-02-16 23:03:03 +00:00
David Anderson	b7a2c227ca	- Work fetch / scheduler: There are two mechanisms to prevent the scheduler from sending jobs that won't finish by their deadline. Simple mechanism: The client sends the interval x for which CPUs are projected to be saturated. Given a job with estimated duration y, the scheduler doesn't send it if x + y exceeds the delay bound. If it does send it, x is incremented by y. Complex mechanism: Client sends workload description. Scheduler does EDF simulation, sees if deadlines are missed. The only project using this AFAIK is BOINC alpha test. Neither of these mechanisms takes coprocessors into account, and as a result jobs could be sent that are doomed to miss their deadline. This checkin adds coprocessor awareness to the Simple mechanism. Changes: Client: compute estimated delay (i.e. time until non-saturation) for coprocessors as well as CPU. Send them in scheduler request as part of coproc descriptor. Scheduler: Keep track of estimated delays separately for different resources - client: fixed bug that computed CPU estimated delay incorrectly - client: the work request (req_secs) for a resource is the min of the project's share and the shortfall. svn path=/trunk/boinc/; revision=17086	2009-01-30 21:25:24 +00:00
David Anderson	574d1fe087	- client: don't request work for a resource if it has no shortfall. - client and server: get rid of coproc_cuda global. svn path=/trunk/boinc/; revision=17019	2009-01-26 05:00:49 +00:00
David Anderson	f90dddc9a6	- client: clamp long term debts tp +- 1 week - client: fix CUDA debt calculation - client: don't accumulate debt if project->dont_request_more_work - client: improves messages svn path=/trunk/boinc/; revision=16909	2009-01-14 23:56:07 +00:00
David Anderson	377545a056	- scheduler: if we're not sending work because of the user's "no GPUs" pref, tell them so. - scheduler: fix bug that caused no CUDA jobs to be sent svn path=/trunk/boinc/; revision=16893	2009-01-12 23:47:52 +00:00
David Anderson	2cc81a97d5	- scheduler: initialize COPROC fields svn path=/trunk/boinc/; revision=16891	2009-01-12 23:08:16 +00:00
David Anderson	8740ffdc94	- client: more work-fetch stuff. No more per-project shortfall. It's getting pretty close. svn path=/trunk/boinc/; revision=16765	2009-01-03 06:01:17 +00:00
David Anderson	8c591e31df	- client: first whack at new work-fetch logic. Very preliminary. svn path=/trunk/boinc/; revision=16754	2008-12-31 23:07:59 +00:00
David Anderson	fae0903c0f	- scheduler: store CUDA total memory as a double, since it can be 4GB or larger svn path=/trunk/boinc/; revision=16737	2008-12-22 22:12:57 +00:00
David Anderson	4a65681176	- scheduler: if client has coprocs, put a textual summary of them in host.serialnum (currently unused) - web: show coprocs on host detail page - db_dump: include coproc info in host XML svn path=/trunk/boinc/; revision=16697	2008-12-16 18:46:28 +00:00
David Anderson	b3bc71047e	- client, CUDA detection: 1) report all devices found 2) use the specs of the fastest one svn path=/trunk/boinc/; revision=16669	2008-12-11 21:44:22 +00:00
David Anderson	63a81014fe	- line endings svn path=/trunk/boinc/; revision=16176	2008-10-09 19:06:01 +00:00
David Anderson	6070af4fea	- client: fix bugs in coprocessor scheduling; add new <coproc_debug> log flag svn path=/trunk/boinc/; revision=16122	2008-10-03 21:55:34 +00:00
David Anderson	a4d5d49b28	- client: attempt to fix CPU sched bug in the presence of GPUs (if there was an idle GPU, it would run unboundedly many CPU jobs) svn path=/trunk/boinc/; revision=16043	2008-09-25 01:04:53 +00:00
Eric J. Korpela	40e243412d	- Fixed fcgi builds to use an installed version of fcgi_stdio.h rather than a modified boinc version. - Added new header "boinc_fcgi.h" to be used instead of "fcgi_stdio.h". This header defines I/O functions in the namespace FCGI rather than using redefined functions the way "fcgi_stdio.h" does. This was causing a lot of headaches when both <cstdio> and "fcgi_stdio.h" was called. Using overloaded functions fixes this problem, except when the only difference between functions is the return type (for example ::fopen() returns FILE* and FCGI::fopen() returns FCGI_FILE*). - Fixed some missing "#ifdef _WIN32" blocks in filesys.C svn path=/trunk/boinc/; revision=15984	2008-09-09 19:10:42 +00:00
Rom Walton	481e45a50a	- client: Both Windows x86 and Windows x64 CUDA Runtime libraries should be 2.0. This avoids crashes related to data structure changes in the Runtime. coprocs/CUDA/mswin/Win32/Debug/bin/ cudart.dll coprocs/CUDA/mswin/Win32/Release/bin/ cudart.dll coprocs/CUDA/mswin/Win32/ReleaseSigned/bin/ cudart.dll coprocs/CUDA/mswin/x64/Debug/bin/ cudart.dll coprocs/CUDA/mswin/x64/Release/bin/ cudart.dll coprocs/CUDA/mswin/x64/ReleaseSigned/bin/ cudart.dll lib/ coproc.C, .h svn path=/trunk/boinc/; revision=15925	2008-08-22 22:15:08 +00:00
David Anderson	87cf35f89b	- client: fix CPU scheduling logic related to coprocessors Old: when checking whether an app can be run, check for sufficient coprocessors relative to the current coprocessor usage. Bug: it there are 2 CUDA jobs, the scheduler will decide to run both. enforce_scheduler() will only be able to run one, and the other CPU will be idle. New: include coprocessor usage (along with RAM and CPUs) in the check, and do a simulated reservation. In the above scenario, the scheduler will select one CUDA app and one non-CUDA app. svn path=/trunk/boinc/; revision=15904	2008-08-20 17:34:18 +00:00
David Anderson	4f66bb4c95	- added copyright and license info to .C, .cpp, .h files - scheduler: fix bug in adaptive replication: if send an unreplicated job to untrusted host, set both wu.target_nresults and wu.min_quorum to app.target_nresults. svn path=/trunk/boinc/; revision=15762	2008-08-06 18:36:30 +00:00

1 2

69 Commits