boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	0b11b3f6e2	scheduler: don't send spurious "no tasks available" msgs w/ score scheduling	2014-06-16 16:53:49 -07:00
David Anderson	56934f8fbe	scheduler: clean up job dispatch logging There are now 3 flags for job dispatch logging: <debug_send/>: info about work request, jobs sent, other high-level stuff <debug_send_scan/>: info about scans through job cache <debug_send_job/>: info about individual jobs (e.g. reason for not sending)	2014-06-12 11:33:11 -07:00
David Anderson	52bd196c4f	scheduler: fix bugs in sending non-compute-intensive jobs	2014-05-26 21:07:07 -07:00
David Anderson	3c64bbb837	scheduler: fix bug where first NCI job in shared mem never gets sent In SCHED_SHMEM::no_work() we "lock" the job by setting its state to our PID. When checking the job in send_job_for_app() we need to accept this state was well as STATE_PRESENT. From Jack Yang.	2014-05-09 00:18:20 -07:00
Eric J Korpela	244ba5bc85	SCHED: modified scheduled log output to use unsigned format for WU and RESULT ids. This allows IDs greater than 2^31 to be printed.	2013-06-19 10:15:08 -07:00
David Anderson	c9c9f2bae0	- scheduler: code shuffle; new file sched_check.cpp contains functions that decide whether a job can be sent to a host	2013-04-09 12:19:00 -07:00
David Anderson	12319ca82b	- scheduler: add code (commented out for now) for new implementation of score-based scheduling.	2013-04-09 11:10:50 -07:00
David Anderson	6f962d5b61	- file upload handler: in FCGI version, check for trigger file each time through loop (from Bernd). - validator: fix bug that zeroed result.random	2013-03-04 17:24:18 +01:00
David Anderson	a64cb793f1	- scheduler: attempted performance enhancement. Old: each scheduler process holds a semaphore while scanning the shared-mem job array. On machines with many CPUs there seems to be contention for this semaphore, causing slow scheduler response and possibly connection failures. New: Don't hold the semaphore while scanning array. Instead, if find a job that passes quick_check(), acquire the semaphore and recheck that the job is present in array and passes quick_check(). - client: show messages if app_config.xml has unrecognized tags	2013-03-04 17:16:56 +01:00
David Anderson	01c0a9a4b0	- scheduler: when resend jobs: - don't use devices for which work is not being requested - obey wu_is_infeasible_custom() (e.g. don't send SETI@home VLAR jobs to GPUs) - scheduler: add <debug_array_detail> log flag for slot-level messages - admin web: show and allow control of app.beta	2013-03-01 16:26:08 +01:00
David Anderson	ca652519cf	- scheduler: log message tweaks - Some C++ files in client had execute permissions (??). Clear them.	2013-03-01 16:26:08 +01:00
David Anderson	9fb0328d9b	- scheduler: add separate log flag for locality sched lite	2013-03-01 16:26:08 +01:00
David Anderson	cc13f2ee6f	- scheduler: fix logic error limited locality scheduling. In LLS array pass, skip file-on-host check if host doesn't have any sticky files. TODO: it should actually be "any sticky files for this app". But we currently don't have any way to know that. svn path=/trunk/boinc/; revision=26108	2012-09-13 17:38:55 +00:00
David Anderson	77f44e521c	- scheduler: more detailed msgs for NCI job sending svn path=/trunk/boinc/; revision=26080	2012-09-08 04:05:50 +00:00
David Anderson	6b81e2ffc3	- scheduler: fix sending of NCI jobs. We were failing to mark the cache entries as free. - API: initialize GPU device # to -1; If client doesn't give us a device number, something is wrong and it's better to not start computing. svn path=/trunk/boinc/; revision=26079	2012-09-06 23:44:03 +00:00
David Anderson	11a6e85632	- scheduler: support for projects with some non-CPU-intensive apps (but not all) wasn't finished. New logic: if the project has an NCI app then: - make a list of NCI apps for which the client doesn't have a job in progress. - try to send one job for each of these apps - do this even if no work is being requested. - don't send jobs for NCI apps by other mechanisms NOTE: the client logic isn't quite right for mixed NCI projects. If there's no job for a given NCI app, the client should do a scheduler RPC. This isn't critical so we won't do this now. svn path=/trunk/boinc/; revision=26068	2012-09-01 04:58:12 +00:00
David Anderson	b1d1e21de4	- remote job submission: start writing a general-purpose cmdline tool for remote job submission (not done) - remote job submission: support the 4 file modes described in the documentation (not done) svn path=/trunk/boinc/; revision=26067	2012-08-31 06:11:06 +00:00
David Anderson	6b7fb36056	- scheduler: msg tweaks svn path=/trunk/boinc/; revision=26066	2012-08-29 18:08:15 +00:00
David Anderson	96b6e172f9	- scheduler: improved log messages for limited locality scheduling svn path=/trunk/boinc/; revision=26065	2012-08-29 03:09:10 +00:00
David Anderson	9ccb8fa38d	- scheduler: add support for limited locality scheduling - API: remove support for PPM files svn path=/trunk/boinc/; revision=26062	2012-08-27 17:00:43 +00:00
David Anderson	1776a244ae	- web: when showing a batch, recompute and update its fraction done - feeder: don't enumerate results for WUs with nonzero error_mask - scheduler: in slow_check(), make sure the WU error_mask is still zero svn path=/trunk/boinc/; revision=25822	2012-06-29 06:53:48 +00:00
David Anderson	023031a497	- scheduler: add a lot more debug messages if <debug_array> is set svn path=/trunk/boinc/; revision=25694	2012-05-18 18:13:04 +00:00
David Anderson	2ed1cfbbb2	- scheduler and create_work: fix bugs that caused targeted jobs to be sent to non-targeted hosts. The feeder was erroneously putting targeted jobs in the shared mem cache. Changes: - The feeder only enumerates jobs for which workunit.transitioner_flags is zero. NOTE: this field is nonzero iff the job is assigned. - create_work: when creating an assigned jobs, set workunit.transitioner_flags appropriately svn path=/trunk/boinc/; revision=25314	2012-02-22 22:13:08 +00:00
David Anderson	5d76e13277	- scheduler: tweaks to last night's checkin. In the inner loop of scan_work_array() there are two WORKUNITs: - the one that's part of wu_result (in the shared-mem array) - a temp copy. quick_check() may modify this in host-specific ways (e.g., adjusting rsc_fpops_est or delay_bound). This is the one we pass to add_result_to_reply(). When we reread hr_class and app_version_id from the DB, update both structs. svn path=/trunk/boinc/; revision=24493	2011-10-26 16:51:10 +00:00
David Anderson	4b826b52a0	- scheduler: fix bug in the "homogeneous app version" (HAV) feature (reported by Kevin Reed). The problem: cache inconsistency. If there are 2 results for the same WU in shared mem, and 2 scheduler instances get them around the same time, they can send them with different app versions. We already fixed this problem for HR by 1) rereading the relevant WU fields while deciding whether to send the result 2) doing a "careful update" of the WU field using a where clause to make sure it wasn't modified in the (short) interval since rereading it. I fixed the HAV problem in the same way, and merged the two mechanisms to combine the DB queries. Also: - The rereads are done in slow_check() (see below). - The careful updates are done in update_wu_on_send(), and this is called before doing careful updates on result fields. That way, if the WU updates fail, we don't have orphaned results. - already_sent_to_different_platform_careful() (sic) no longer does DB stuff, so it's merged with already_send_to_different_hr_class() (better name) NOTE: slow_check() is used in array scheduling only. Score-based scheduling uses other code, in which this bug is not yet fixed. Locality scheduling doesn't support HR or HAV at all. This should be unified. svn path=/trunk/boinc/; revision=24484	2011-10-26 07:15:22 +00:00
David Anderson	436415cfe1	- scheduler, back end: add "homogeneous app version" feature. Lets you specify, on a per-app basis, that all instances should be done using the same app version. This is for validation in the presence of GPUs. - scheduler: code cleanup - Instead of adding a bunch of non-DB fields to RESULT, used a derived class SCHED_DB_RESULT. - Instead of storing a pointer to BEST_APP_VERSION in RESULT, store the structure itself. This simplifies the memory allocation situation. - client: condition "Got server request to delete file" messages on <file_xfer_debug> svn path=/trunk/boinc/; revision=23636	2011-06-06 03:40:42 +00:00
David Anderson	e480cef000	- scheduler: if we're not sending jobs because of user prefs (no CPU, no GPU, selected apps) send a message, not a notice. Assume the user knew what they were doing, and doesn't want to be nagged. - scheduler: check for the existence of an app version before checking for user selected-app prefs. This prevents sending "no jobs available for selected apps" message when no app versions exist for non-selected apps - scheduler: use "tasks" instead of "work" in user messages svn path=/trunk/boinc/; revision=23168	2011-03-04 19:40:59 +00:00
David Anderson	b169e5ab0f	- server programs: print error message instead of numeric retval in log messages svn path=/trunk/boinc/; revision=22647	2010-11-08 17:51:57 +00:00
David Anderson	81973a9fff	- scheduler: fix structural problems with sending user messages. Old: various redundant and/or misleading messages were sent. New: - if host w/ no GPU contacts a GPU-only project, send high-pri message saying they need a GPU - if host w/ GPU has driver too old for all versions, send high-pri message saying to update driver - if host w/ GPU has driver too old for some versions, send low-pri message saying to update driver - if host has GPU but too little RAM for any app, send low-pri message saying so - scheduler: revamp GPU plan class functions svn path=/trunk/boinc/; revision=21760	2010-06-16 22:07:19 +00:00
David Anderson	b2451544e1	- server: change the following from per-host to per-(host, app version): - daily quota mechanism - reliable mechanism (accelerated retries) - "trusted" mechanism (adaptive replication) - scheduler: enforce host scale probation only for apps with host_scale_check set. - validator: do scale probation on invalid results (need this in addition to error and timeout cases) - feeder: update app version scales every 10 min, not 10 sec - back-end apps: support --foo as well as -foo for options Notes: - If you have, say, cuda, cuda23 and cuda_fermi plan classes, a host will have separate quotas for each one. That means it could error out on 100 jobs for cuda_fermi, and when its quota goes to zero, error out on 100 jobs for cuda23, etc. This is intentional; there may be cases where one version works but not the others. - host.error_rate and host.max_results_day are deprecated TODO: - the values in the app table for limits on jobs in progress etc. should override rather than config.xml. Implementation notes: scheduler: process_request(): read all host_app_versions for host at start; Compute "reliable" and "trusted" for each one. write modified records at end get_app_version(): add "reliable_only" arg; if set, use only reliable versions skip over-quota versions Multi-pass scheduling: if have at least one reliable version, do a pass for jobs that need reliable, and use only reliable versions. Then clear best_app_versions cache. Score-based scheduling: for need-reliable jobs, it will pick the fastest version, then give a score bonus if that version happens to be reliable. When get back a successful result from client: increase daily quota When get back an error result from client: impose scale probation decrease daily quota if not aborted Validator: when handling a WU, create a vector of HOST_APP_VERSION parallel to vector of RESULT. Pass it to assign_credit_set(). Make copies of originals so we can update only modified ones update HOST_APP_VERSION error rates Transitioner: decrease quota on timeout svn path=/trunk/boinc/; revision=21181	2010-04-15 03:13:56 +00:00
David Anderson	5b7f8b8348	- web: fix bug that caused "send email" and "show hosts" in project prefs to always select "no" svn path=/trunk/boinc/; revision=20786	2010-03-04 04:16:00 +00:00
David Anderson	12a85e5ced	- scheduler: code cleanup: goto considered harmful - scheduler: when calculate scheduler runtime, don't include the part reading request msg from client. That can be misleadingly long svn path=/trunk/boinc/; revision=20781	2010-03-03 19:29:23 +00:00
David Anderson	56a8296b5b	- scheduler: compute no_jobs_available correctly in the presence of multiple scheduling types (e.g., locality and job array) From Nils Brause svn path=/trunk/boinc/; revision=19559	2009-11-12 21:30:33 +00:00
David Anderson	8b701fc73f	- scheduler: fix messed-up deadline check logic. Old: 1) check deadline based on wu.delay_bound 2) in add_result_to_reply(), potentially modify wu.delay_bound, e.g. because of retry acceleration problem: reducing delay bound may cause deadline miss New: 1) new function get_delay_bound_range() (called from wu_is_infeasible_fast()) returns optimistic and pessimistic delay bounds. Retry acceleration logic is here. 2) check deadline based on optimistic bound; if that fails, check based on pessimistic bound. Set wu.delay_bound to the one that worked. Notes: - get_delay_bound_range() needs result priority and report deadline, and it's called before we read the full result. So add these items to WORK_ITEM and WU_RESULT. - get_delay_bound_range() could be customized for project-specific deadline policy. - add_result_to_reply() was becoming a toxic waste dump. Deadline-related stuff should have been factored out in any case. svn path=/trunk/boinc/; revision=18946	2009-08-31 19:35:46 +00:00
David Anderson	b300519444	svn path=/trunk/boinc/; revision=18825	2009-08-10 04:49:02 +00:00
David Anderson	2e5d9bd778	- scheduler: add new config option <max_wus_in_progress_gpus>. The limit on jobs in progress is now max_wus_in_progress * NCPUS + max_wus_in_progress * NGPUS where NCPUS and NGPUS reflect prefs and are capped. Furthermore: if the client reports plan class for in-progress jobs (see checkin of 31 May 2009) then these limits are enforced separately; i.e. the # of in-progress CPU jobs is <= max_wus_in_progressNCPUS, and the # of in-progress GPU jobs is <= max_wus_in_progress_gpuNGPUS - scheduler config: rename <cuda_multiplier> to <gpu_multiplier> - scheduler: <max_wus_to_send> is now scaled by (NCPUS + gpu_multiplier*NGPUS) - scheduler: don't keep scanning array if !work_needed() - scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp - scheduler: don't say "no work available" if jobs are available but work_needed() is initially false svn path=/trunk/boinc/; revision=18255	2009-06-01 22:15:14 +00:00
David Anderson	84afd18450	- scheduler: move app-version selection and score-based scheduling to new files. svn path=/trunk/boinc/; revision=17630	2009-03-19 16:35:35 +00:00
David Anderson	76da7d8653	- scheduler: msg tweak svn path=/trunk/boinc/; revision=17584	2009-03-10 21:34:49 +00:00
David Anderson	41ed82f791	- scheduler: fix bugs that caused only 1 job to be sent svn path=/trunk/boinc/; revision=17555	2009-03-07 01:00:05 +00:00
David Anderson	c22b62f25b	- scheduler: fix bugs in support for anonymous platform + coprocs (app versions don't have a <coprocs> around coproc elements, may an oversight but let's stick with it). Anyway, I think it's working now. - lib: remove "owner" array from COPROC. This was used in client to keep track of assignment of coprocessors to tasks, but we got rid of the reserve/free scheme. NOTE: this breaks the mechanism for passing --device N to apps; I'll have to do this another way. Stay tuned. svn path=/trunk/boinc/; revision=17543	2009-03-06 22:21:47 +00:00
David Anderson	33d5a81cf6	- scheduler: add locality_scheduling arg to add_result_to_reply(); eliminate the need to diddle around with config.locality_scheduling. svn path=/trunk/boinc/; revision=17445	2009-03-03 16:38:54 +00:00
David Anderson	4cd3c530b0	- scheduler: reduce frequency of calls to work_needed() svn path=/trunk/boinc/; revision=17003	2009-01-23 22:52:35 +00:00
David Anderson	91e120b3f4	- scheduler: improve message formatting; add <debug_locality> flag for locality scheduling messages svn path=/trunk/boinc/; revision=16921	2009-01-15 20:23:20 +00:00
David Anderson	74423f23b6	- scheduler: if no jobs available to send, inform the user svn path=/trunk/boinc/; revision=16730	2008-12-22 00:10:02 +00:00
David Anderson	312ffba708	- API: remove BOINC_OPTIONS::worker_thread_stack_size - web: check whether to show profile in separate function from displaying profile; eliminate double headers - scheduler: finish purge of redundant arguments svn path=/trunk/boinc/; revision=16726	2008-12-19 18:14:02 +00:00
David Anderson	ef52366c1b	- web: fix bug that caused login to fail - sched: more global vars svn path=/trunk/boinc/; revision=16695	2008-12-16 16:29:54 +00:00
David Anderson	49a69de194	- scheduler: estimate job durations based on the FLOPS estimate for the selected APP_VERSION, rather than on the CPU benchmarks. Otherwise estimates are wrong for GPU or multi-thread apps. - scheduler: start switching from having SCHED_REQUEST and SCHED_REPLY as globals instead of passing them around as args; to be continued. svn path=/trunk/boinc/; revision=16691	2008-12-15 21:14:32 +00:00
David Anderson	8ea8081626	- scheduler: fix memory leak when reporting time stats logs - scheduler: fix egregious bug where wu_is_infeasible_fast() result is ignored, and we send jobs to hosts that can't handle them. - scheduler: don't check for disk space in work_needed(); do it in check_disk(), which generates a message to user. - scheduler: add -debug_log flag, which sends stderr to "debug_log" rather than scheduler_log.txt (for debugging) svn path=/trunk/boinc/; revision=16578	2008-11-26 21:49:36 +00:00
David Anderson	5039207e2c	- scheduler: add <have_cuda_apps> config flag. If set the "effective NCPUS" (which is used to scale daily_result_quota and max_wus_in_progress) is max'd with the # of CUDA GPUs. svn path=/trunk/boinc/; revision=16246	2008-10-21 23:16:07 +00:00
David Anderson	98cfb8d3b0	- rename .C files to .cpp so that Doxygen will work svn path=/trunk/boinc/; revision=16069	2008-09-26 18:20:24 +00:00

50 Commits