boinc/sched
David Anderson a151ad6cb3 - client/scheduler: deal with situation where GPU has enough
RAM to run job, but when we actually run the job
    not enough GPU RAM is free, so the application fails.
    This can cause a large number of jobs to fail.
    Solution:
    - app_plan() can specify the GPU RAM requirements of an app version.
        This is passed to the client in a new field
        <gpu_ram> of the <app_version> element.
    - prior to starting or restarting a GPU app, the client
        checks the amount of free RAM on the particular GPU.
        If it's not enough for the app version,
        the client doesn't start it,
        and arranges for the scheduler to ignore it for 5 minutes
        (by which point there might be more free GPU RAM)
    Notes:
    1) this change will have effect only when
        both client and scheduler are updated.
    2) the check is done in enforce_schedule(),
        rather than schedule_cpus(),
        because only at that point
        have we assigned a specific GPU to the job.
    3) there's another case to deal with:
        a GPU app's malloc of GPU RAM fails in the middle of the job.
        Currently the job fails.
        I plan to add an API call boinc_temporary_exit(x) so
        that the job can exit and potentially restart in x seconds.
        (In principle this mechanism is sufficient for all cases,
        but it could lead to a lot of starting/exiting,
        so the current change is worthwhile).

svn path=/trunk/boinc/; revision=19864
2009-12-11 22:45:59 +00:00
..
Makefile.am - unix: build fixes 2009-12-01 17:04:28 +00:00
assimilate_handler.h - client: error if a <file_info> in app_info.xml has any URLs 2009-07-09 20:18:56 +00:00
assimilator.cpp - client, Mac: don't do res_init(). It causes a crash. 2009-05-07 13:54:51 +00:00
assimilator.py - python assimilator fix 2009-06-16 19:38:35 +00:00
census.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
credit.cpp - server code: moved everything related to credit-granting to credit.cpp, 2009-08-12 16:26:43 +00:00
credit.h - compile fixes 2009-08-13 03:35:26 +00:00
credit_test.cpp - credit tweaks 2009-11-15 18:15:35 +00:00
db_dump.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
db_dump_spec.xml *** empty log message *** 2006-03-06 21:40:07 +00:00
db_purge.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
delete_file.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
edf_sim.cpp - scheduler: improve message formatting; add <debug_locality> flag 2009-01-15 20:23:20 +00:00
edf_sim.h - changed some comments for Doxygen 2008-10-04 23:44:24 +00:00
feeder.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
file_deleter.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
file_upload_handler.cpp - scheduler/file upload handler: ignore zero-length cmdline args. 2009-10-12 16:44:26 +00:00
get_file.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
handle_request.cpp - scheduler: compute no_jobs_available correctly 2009-11-12 21:30:33 +00:00
handle_request.h - code shuffling 2009-08-10 04:56:46 +00:00
hr.cpp - rename .C files to .cpp so that Doxygen will work 2008-09-26 18:20:24 +00:00
hr.h - added copyright and license info to .C, .cpp, .h files 2008-08-06 18:36:30 +00:00
hr_info.cpp - Added checks for net/*.h, arpa/*.h, netinet/*.h and code to figure out 2009-02-26 00:23:23 +00:00
hr_info.h - added copyright and license info to .C, .cpp, .h files 2008-08-06 18:36:30 +00:00
make_work.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
message_handler.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
pymw_assimilator.py - PyMW assimilator fixes from Jeremy 2009-07-01 23:58:04 +00:00
request_file_list.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
sample_assimilator.cpp - client: don't print error message if output file with <copy_file> 2009-09-17 21:06:11 +00:00
sample_bitwise_validator.cpp - sample bitwise validator: make it work for binary files 2009-05-01 18:25:17 +00:00
sample_db_dump_spec.xml *** empty log message *** 2004-06-21 05:03:56 +00:00
sample_dummy_assimilator.cpp - rename .C files to .cpp so that Doxygen will work 2008-09-26 18:20:24 +00:00
sample_hr_info.txt - added sample host-distribution file for HR 2007-07-06 18:19:10 +00:00
sample_trivial_validator.cpp - rename .C files to .cpp so that Doxygen will work 2008-09-26 18:20:24 +00:00
sample_work_generator.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
sched_array.cpp - scheduler: compute no_jobs_available correctly 2009-11-12 21:30:33 +00:00
sched_array.h - scheduler: add new config option <max_wus_in_progress_gpus>. 2009-06-01 22:15:14 +00:00
sched_assign.cpp svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
sched_assign.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_config.cpp - Unix build fix 2009-09-21 16:15:28 +00:00
sched_config.h Sched: config option not to store stderr_out if exit_status==0 (to save on DB size). With help from Nicolas Alvarez. 2009-06-30 18:00:58 +00:00
sched_customize.cpp - client/scheduler: deal with situation where GPU has enough 2009-12-11 22:45:59 +00:00
sched_customize.h - client/scheduler: deal with situation where GPU has enough 2009-12-11 22:45:59 +00:00
sched_driver.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
sched_hr.cpp svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
sched_hr.h - scheduler: move app-version selection and score-based scheduling 2009-03-19 16:35:35 +00:00
sched_locality.cpp - scheduler: maintain WORK_REQ::no_jobs_available correctly 2009-11-09 23:25:04 +00:00
sched_locality.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_main.cpp - scheduler/file upload handler: ignore zero-length cmdline args. 2009-10-12 16:44:26 +00:00
sched_main.h - code shuffling 2009-08-10 04:56:46 +00:00
sched_msgs.cpp - scheduler: improve message formatting; add <debug_locality> flag 2009-01-15 20:23:20 +00:00
sched_msgs.h - STILL WORK TO BE DONE TO GET locale STUFF INSTALLED PROPERLY!!! 2009-01-13 23:06:02 +00:00
sched_resend.cpp - scheduler: code cleanup 2009-08-21 19:14:15 +00:00
sched_resend.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_result.cpp - scheduler and back end: add new fields to result table: 2009-09-03 20:26:31 +00:00
sched_result.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_score.cpp - scheduler: compute no_jobs_available correctly 2009-11-12 21:30:33 +00:00
sched_score.h svn path=/trunk/boinc/; revision=18765 2009-07-29 18:34:27 +00:00
sched_send.cpp - scheduler: compute no_jobs_available correctly 2009-11-12 21:30:33 +00:00
sched_send.h - scheduler: fix messed-up deadline check logic. 2009-08-31 19:35:46 +00:00
sched_shmem.cpp - client: no network activity if running CPU benchmarks 2009-10-23 21:57:58 +00:00
sched_shmem.h - client: no network activity if running CPU benchmarks 2009-10-23 21:57:58 +00:00
sched_timezone.cpp svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
sched_timezone.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_types.cpp - client/scheduler: deal with situation where GPU has enough 2009-12-11 22:45:59 +00:00
sched_types.h - client/scheduler: deal with situation where GPU has enough 2009-12-11 22:45:59 +00:00
sched_util.cpp svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
sched_util.h svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
sched_version.cpp - client/scheduler/web: add per-project preferences for whether 2009-09-28 04:24:18 +00:00
sched_version.h - scheduler: move app-version selection and score-based scheduling 2009-03-19 16:35:35 +00:00
send_file.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
show_shmem.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
single_job_assimilator.cpp - client, Mac: don't do res_init(). It causes a crash. 2009-05-07 13:54:51 +00:00
start - tweak to start 2009-09-28 16:19:20 +00:00
testasm.py - server: improve the Python assimilator framework; 2009-06-12 03:06:01 +00:00
time_stats_log.cpp - code shuffling 2009-08-10 04:56:46 +00:00
time_stats_log.h - code shuffling 2009-08-10 04:56:46 +00:00
transitioner.cpp - transitioner: fix to 15 Sept checkin 2009-09-18 15:59:40 +00:00
trickle_handler.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
update_stats.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00
validate_util.cpp - server code: moved everything related to credit-granting to credit.cpp, 2009-08-12 16:26:43 +00:00
validate_util.h - server code: moved everything related to credit-granting to credit.cpp, 2009-08-12 16:26:43 +00:00
validate_util2.cpp - rename .C files to .cpp so that Doxygen will work 2008-09-26 18:20:24 +00:00
validate_util2.h - added copyright and license info to .C, .cpp, .h files 2008-08-06 18:36:30 +00:00
validator.cpp - create_work function and script: 2009-09-16 03:10:22 +00:00
validator.h - validator: add a global variable WORKUNIT* g_wup; 2008-08-21 20:58:32 +00:00
wu_check.cpp - server programs: add --help and --version cmdline options to all. 2009-09-17 17:56:59 +00:00