boinc

Commit Graph

Author	SHA1	Message	Date
David Anderson	79c6225fc2	- configure: work with "gold" linker	2013-03-05 13:33:27 +01:00
David Anderson	83211fd1c6	client: kill lingering apps - client: if an app's finish file has existed for 10 seconds, kill it; it must be hung in boinc_finish(). This behavior has been seen with LHC@home and maybe other projects.	2013-03-01 15:56:12 +01:00
David Anderson	aa289f0916	- A bunch of tweaks from Steffen Moller, e.g. using MAXPATHLEN svn path=/trunk/boinc/; revision=26133	2012-09-21 03:52:24 +00:00
David Anderson	72368a6b20	- A first attempt to fix the bug where apps die with exit(1) (whereas they didn't do this w/ older clients). On Windows, the client uses TerminateProcess(h, 1) to kill processes; the 1 is the exit code the process will appear to have. So instead, add a "will_restart" bool arg to the various kill functions, and if set use 0 (= STATUS_SUCCESS), otherwise use EXIT_ABORTED_BY_CLIENT. Note: in principle this shouldn't make any difference for quitting tasks, since handle_exited_app() checks for task state QUIT_PENDING and ignores the exit code in that case. The only place I can see where it would make any difference is when we kill a process because it hasn't been handling queued shared-memory messages for 180 seconds. - client: add more info to the message about an exited app - client: function return values (ERR_) are different from process exit codes (EXIT_). But in many places we were using return values as exit codes. Fix these. Also, break out the different types of limits a job can exceed (time, disk, memory) into difference exit codes. svn path=/trunk/boinc/; revision=25601	2012-04-26 05:28:45 +00:00
David Anderson	bf393ad913	- client: if a job calls boinc_temporary_exit() 100 times, abort it. Otherwise it could keep doing it forever (e.g. if there's not ever enough available GPU RAM) svn path=/trunk/boinc/; revision=25483	2012-03-23 21:09:44 +00:00
David Anderson	fc8191220f	- client: change timeout for job quit/abort from 60 back to 15 (time between sending app a quit/abort message and, if not exited yet, killing it) - client: if app has reported an "other PID" (e.g., vboxwrapper reports the VBoxHeadless PID) then include it (along with descendants) in the list of processes we kill when killing the job. svn path=/trunk/boinc/; revision=25470	2012-03-21 20:30:14 +00:00
David Anderson	ad232b2869	- client: report completed results if a time-of-day network suspend is scheduled within the next 30 minutes svn path=/trunk/boinc/; revision=25465	2012-03-20 19:37:04 +00:00
Rom Walton	25142dda02	- VBOX: Give the VM process a short priority boost when responding to a quit request. On older XP machines it might speed up the memory dump to disk. - client: Increase the quit request timeout from 10 seconds to 60 seconds for machines running VMs and slow disk drives. It should give the VM enough time to gracefully shutdown and not give boinc reason to kill the wrapper. client/ app.h samples/vboxwrapper/ vbox.cpp, .h vboxwrapper.cpp svn path=/trunk/boinc/; revision=25433	2012-03-16 01:04:43 +00:00
David Anderson	7c3bc68a05	- API, client, and Manager: add an optional "reason" argument to boinc_temporary_exit(), explaining why the app is exiting. Convey this to the client, and then to the Manager, and display it there and in the log. clientgui/ MainDocument.cpp lib/ gui_rpc_client_ops.cpp gui_rpc_client.h api/ boinc_api.cpp,h client/ client_types.cpp,h app.h app_control.cpp svn path=/trunk/boinc/; revision=25315	2012-02-22 22:56:05 +00:00
David Anderson	541f6dd1f3	- client: bug fix for async file ops: set up files in slot dir when starting an app, whether or not it's the first time svn path=/trunk/boinc/; revision=25221	2012-02-08 21:14:34 +00:00
David Anderson	4adba7ee4e	- client: first pass at async file copy feature. When a large file is copied from a project dir to a slot dir, it's copied in chunks, interleaved with other polling activities such as GUI RPCs. That way the manager doesn't freeze while large copies (e.g. VM images) are happening svn path=/trunk/boinc/; revision=25192	2012-02-03 18:33:39 +00:00
David Anderson	81b29b0cc9	- API: fix queueing problem for graphics-related messages (web graphics URL and remote desktop addr) - GUI RPC and API: change "remote_desktop_connection" to "remote_desktop_addr" everywhere. It's an address, not a connection. - vboxwrapper: log message cleanup svn path=/trunk/boinc/; revision=25044	2012-01-13 19:00:16 +00:00
Rom Walton	3bc326db3e	- client: Add plumbing to support passing the remote desktop connection information to the manager - MGR: Add a "Show VM Console" button for those tasks which report a remote desktop port number. client/ app.cpp, .h app_control.cpp clientgui/ Events.h MainDocument.cpp, .h ViewWork.cpp, .h lib/ gui_rpc_client.h gui_rpc_client_ops.cpp svn path=/trunk/boinc/; revision=25036	2012-01-12 22:05:25 +00:00
David Anderson	b003b8e290	- add support for APP::needs_network flag. If set, don't run jobs for that app while network is suspended. - client: parse this flag and maintain in state file; do a job reschedule when network suspend state changes - GUI RPC: add RESULT::network_wait flag; if set, this job is waiting for network access to be allowed - Manager: display the above in task info - add support for "web graphics URL" (see above) - client: parse message containing URL on graphics_reply channel and store in ACTIVE_TASK::web_graphics_url - GUI RPC: add RESULT::web_graphics_url - Manager: if web graphics URL is present, Show Graphics opens a browser - remove some vestigial code for pre-V6 graphics svn path=/trunk/boinc/; revision=24899	2011-12-26 03:30:32 +00:00
David Anderson	1d38837788	- client: call xp.skip_unexpected() if get unexpected tag, to avoid showing multiple error messages - client simulator: bug fixes and tweaks svn path=/trunk/boinc/; revision=24408	2011-10-17 20:46:06 +00:00
David Anderson	c7e505dc81	- client: fix memory leak when reading stderr of completed job. This caused 128KB + size of stderr loss for each job. - client: print error message if reading stderr fails (e.g. because of malloc failure) svn path=/trunk/boinc/; revision=24336	2011-10-05 22:16:02 +00:00
David Anderson	c61103ac26	- client: make the attributes of GUI RPCs (network, authentication) explicit rather than determined by position in a list. - client: add a new "read-only" attribute for GUI RPCs. This is in preparation for handling GUI RPCs in separate threads. - client: remove code to support pre-V6 graphics. svn path=/trunk/boinc/; revision=24232	2011-09-18 21:06:49 +00:00
David Anderson	d8f20bceea	- vboxwrapper: report network usage to the client - client: include the above in enforcing network quota preferences svn path=/trunk/boinc/; revision=24227	2011-09-16 19:16:12 +00:00
David Anderson	8b5344e922	- client: finish next-to-last checkin svn path=/trunk/boinc/; revision=24157	2011-09-11 17:26:31 +00:00
David Anderson	4e946854c1	- client/API/vboxwrapper: add a mechanism so that apps can report sub-processes that are not descendants (e.g., virtual machines) These processes are then counted as part of the app, not as "non-BOINC CPU time". This fixes a bug where processing was incorrectly suspended because CPU usage by VM apps exceeded the "CPU usage limit" pref. Implementation: - the PIDs of the processes in question are passed from app to client via shared-memory, in the app_status channel. A new variant of boinc_report_app_status() supports this. - the VBox wrapper queries the PID of the VM, and reports it in this way. - procinfo_app() includes a new argument: a list of PIDs that are part of the app, although not ancestrally related to the main process. - in the client, ACTIVE_TASK now includes a vector "other_pids". If this is nonempty, it's passed to procinfo_app(). svn path=/trunk/boinc/; revision=24123	2011-09-02 20:47:05 +00:00
David Anderson	609d5665cc	- client: pass XML_PARSER& rather than MIOFILE& to parse functions. Preparatory to using new-style XML parsing everywhere. svn path=/trunk/boinc/; revision=23975	2011-08-09 21:44:14 +00:00
David Anderson	15c3ff7d31	- client: if an app version has nonempty file_prefix, copy all its input and output files svn path=/trunk/boinc/; revision=23925	2011-08-03 19:47:26 +00:00
David Anderson	5a8fd0afc7	- client: add optional <file_prefix> to APP_VERSION>. If present, "file_prefix/" is prepended to the logical names of input and output files of jobs using that app version. I.e. for Vbox wrapper based app versions, file_prefix is "share", so that I/O files are put in a "share" subdirectory of the slot dir. - update_versions: add support for <dont_throttle> <file_prefix>x</file_prefix> in version.xml svn path=/trunk/boinc/; revision=23924	2011-08-03 18:14:45 +00:00
David Anderson	c1bf16f7f3	- client: we were assuming that if we ask a task to exit and its main process exits, everything is OK. That's not necessarily the case - buggy apps may have subprocesses that the main process fails to kill. Solution: when we request a task to exit or abort, make a list of the descendants. When the main process exits, kill any remaining descendants. Also: we weren't checking for the ABORT_PENDING case in the process exit logic. This may explain the 5/15 second delay in detaching or resetting a project with running tasks svn path=/trunk/boinc/; revision=23738	2011-06-17 04:18:28 +00:00
David Anderson	fa459d780c	- client: fix bugs in runtime estimation of jobs that have run before but are not currently running. Old: - We maintain the most recent fraction_done in state file. But for apps that checkpoint seldom or never, this is not the relevant value, and frac done may go down when the app runs. - fraction_done_elapsed_time is not initialized, and can have garbage values for jobs that haven't run yet. New: - Record, in the state file, the values of fraction_done and fraction_done_elapsed_time at the most recent checkpoint. When the client starts up, use these values. svn path=/trunk/boinc/; revision=23455	2011-04-26 17:02:09 +00:00
David Anderson	b89ea98838	- client: when estimating job runtime based on fraction done, use the elapsed time when fraction done was last reported, not current elapsed time. Fix problem where est time remaining increases linearly, then abruptly decreases when new frac done is reported. From Bruce Allen. svn path=/trunk/boinc/; revision=23373	2011-04-18 16:32:57 +00:00
David Anderson	28bad727c1	- client: when exclusive app mechanism is used (CPU or GPU) wait for 30 secs after excl app exits before restarting computation svn path=/trunk/boinc/; revision=23048	2011-02-16 20:41:19 +00:00
David Anderson	795e89dbf5	- client: eliminate unnecessary CPU reschedules. Currently we do a reschedule any time a job checkpoints, in case there's a job that has finished a time slice but hasn't checkpointed yet. Instead: flag such jobs, and trigger a reschedule on checkpoint only for flagged jobs. - client: fix instability in job scheduling that happens if a job's estimated completion time in RR sim is close to its deadline. It can alternate between making and missing deadline, causing the scheduler to alternate rapidly between jobs. Solution: if RR sim has marked a job as deadline miss any time in the last (CPU scheduling period), treat it as a deadline miss. svn path=/trunk/boinc/; revision=22928	2011-01-19 16:46:55 +00:00
David Anderson	6478b3e05d	- client: implement more scheduler changes that use recent estimated credit (REC) instead of debt. These changes are enabled by #define USE_REC in work_fetch.h. If this is commented out (the default) the client uses debt-based scheduling, same as before. TODO: work-fetch policy changes - client simulator: various fixes: - compute idle and wasted fraction based on all processing resources, not just CPU - compute job completion times based on FLOPS, not CPU seconds - compute and use project->no_X_apps etc. svn path=/trunk/boinc/; revision=22741	2010-11-23 19:39:47 +00:00
David Anderson	8d9cf013c5	- client: account manager RPC: Additions to request message: <not_started_dur>X</not_started_dur> <in_progress_dur>X</in_progress_dur> The estimated remaining duration of unstarted and in-progress tasks Additions to reply message, within <project>, optional: <suspend>0\|1</suspend> suspend or resume project (overrides local state) <abort_not_started>0\|1</abort_not_started> if set, abort unstarted jobs svn path=/trunk/boinc/; revision=22698	2010-11-17 20:04:58 +00:00
David Anderson	082603f927	compile fix svn path=/trunk/boinc/; revision=22410	2010-09-24 20:37:45 +00:00
David Anderson	fcbb8a286e	- client simulator: major remodel and upgrade. Insteady of using its own XML input files, the simulator now takes a client_state.xml file as input. The simulator generates a synthetic workload based on the projects, apps, app versions, WUs, and result it finds there. This means that a user seeing aberrant behavior can just send their client_state.xml file and (hopefully) we can use the simulator to repro. The simulator now can model GPUs. As of this checkin, the simulator compiles but doesn't work. There should be no change in the actual client. svn path=/trunk/boinc/; revision=22409	2010-09-24 20:02:42 +00:00
David Anderson	07b2830d93	- client: fix bug in accounting of elapsed time and CPU time svn path=/trunk/boinc/; revision=21635	2010-05-25 18:48:53 +00:00
David Anderson	515abee470	- client/manager: keep track of "GPU suspended reason". Report it to the manager (it was already in CC_STATUS, but not populated) - manager: fix system tray icon popup text svn path=/trunk/boinc/; revision=21481	2010-05-12 18:14:30 +00:00
David Anderson	cc8ea9de3c	- client: Win: kill runaway apps using TerminateProcess() rather than TerminateProcessById(). The latter doesn't work in protected mode. - client: pid_handle => process_handle. misnomer svn path=/trunk/boinc/; revision=21272	2010-04-23 22:31:08 +00:00
David Anderson	b0cb81159f	- client: when looking for new file xfers to start, favor those that are partially done - client: fix crashing bug if a project is detached while an RSS feed fetch for it is in progress - code cleanup: switch from /// back to // for comments (so much for doxygen) svn path=/trunk/boinc/; revision=21041	2010-04-01 05:54:29 +00:00
David Anderson	b415b07785	- client: revisit the domino-effect preemption problem. Removed my changes of 19 Jan 2010, which didn't work. Added new mechanism: keep track of whether a job J has ever run in EDF. If so, and if another job of the same project and resource type as J is marked as deadline miss, then mark J as deadline miss, so that it won't get preempted. - web: change "result" to "task" in server status page - admin web: show server stable SVN revision, not trunk svn path=/trunk/boinc/; revision=20805	2010-03-05 21:13:53 +00:00
David Anderson	2a12a8fb8b	- client: if suspending apps because of CPU benchmarks, leave them in memory svn path=/trunk/boinc/; revision=20765	2010-03-02 01:24:03 +00:00
David Anderson	20d6b06f5f	- client: initial checkin for "don't compute if CPU usage above X" svn path=/trunk/boinc/; revision=20192	2010-01-19 00:03:38 +00:00
David Anderson	37aae854f3	- client: scheduling problem: - a project overestimates job FLOP counts - the client starts jobs in EDF mode - as job progresses and fraction done increases, its completion time estimate decreases until it's no longer a deadline miss. - job gets preempted by other job from that project; you end up with lots of partly completed jobs. Solution (I hope): if an app version has running jobs, compute a "temp DCF" for the app version, which is the min of dynamic/static estimates for its jobs. Apply this scaling factor to completion time estimates for unstarted jobs in RR simulation - client: the estimation of remaining time of running jobs was wrong (how did this bug survive so long?) svn path=/trunk/boinc/; revision=20077	2010-01-06 06:01:23 +00:00
David Anderson	37ea627866	- Win compile fixes. Also, needed to provide a replacement for strptime() on Win. WTF? svn path=/trunk/boinc/; revision=20003	2009-12-21 19:20:28 +00:00
David Anderson	a3f80676b7	- API and client: add an API function boinc_temporary_exit(dt). This exits the app with status zero and no finish file, so the client will restart it. It creates a file "temporary_exit" containing dt. The (new) client reads this file and will postpone scheduling the job again for dt seconds. Old clients will treat it as a premature exit, and potentially try to reschedule the job immediately. This function is intended for GPU applications that fail to allocate GPU RAM, presumably because a non-GPU application has it allocated. We don't want the job to fail, and we want to wait for a while before trying the allocation again. svn path=/trunk/boinc/; revision=19879	2009-12-13 05:16:40 +00:00
David Anderson	8182ccd031	- client/manager: first whack at a "snooze GPU" button svn path=/trunk/boinc/; revision=19853	2009-12-10 23:26:35 +00:00
David Anderson	e057c552d8	- client: add <exclusive_gpu_app> option: suspend GPU usage when particular apps are running svn path=/trunk/boinc/; revision=19573	2009-11-13 17:49:18 +00:00
David Anderson	86ee2f5753	- client: fix bug that caused unstarted coproc jobs to preempt ones already running. The problem: we considered a job as started if it has an ACTIVE_TASK. However, we were creating ACTIVE_TASKS for jobs before deciding to run them, because we needed a place to store the coproc reservations. This caused the above bug, and also had the undesirable effect of creating slot directories before they're needed. Solution: store coprocessor reservations in RESULT rather than ACTIVE_TASK. svn path=/trunk/boinc/; revision=19129	2009-09-22 21:02:06 +00:00
David Anderson	f1360e5971	- client: finish the implementation of fractional coproc jobs. - different data structure for keeping track of coproc usage; instead of COPROC having per-instance pointers to ACTIVE_TASK, ACTIVE_TASK now has an array of device number indices for each instance that it's using. - in enforce_schedule(), we call a new function assign_coprocs() that decides what coproc instances each job will use, and prunes jobs for which we can't get an assignment. This function embodies lots of subtlety. - coproc_cmdline() no longer deals with reserving instances; it just has to generate the --device X cmdline svn path=/trunk/boinc/; revision=18880	2009-08-19 23:21:55 +00:00
David Anderson	1dba786d7b	- API: add boinc_elapsed_time() to get elapsed time since start of episode; add APP_INIT_DATA::starting_elapsed_time to get elapsed time from previous episodes svn path=/trunk/boinc/; revision=18535	2009-07-01 17:35:56 +00:00
David Anderson	575565dc22	- client: fixed nasty bug that caused GPU jobs to crash on startup when they're preempting another GPU job. The problem was as follows: - job A is chosen to preempt job B - we tell job B to quit, and initialize job A but don't start it; however, we set if scheduler state to SCHEDULED (rather than UNINITIALIZED) - job B exits, and we start job A. Since its state is not UNITIALIZED, we don't set up its slot dir. - job A runs in an empty slot dir, doesn't find its files, and bombs out. - client: add <slot_debug> option (prints messages about allocation of slots, creating/removing files in slot dirs). svn path=/trunk/boinc/; revision=18217	2009-05-28 19:26:27 +00:00
David Anderson	af93af28f7	- client: eliminate the need to write the state file on each checkpoint. Instead, write the info into a file in the slot directory, and check for these files on startup. This should reduce the overhead of state-file writing on machines with lots of cores. There will still be a flurry of writes each time a job finishes, but reducing that overhead would be a larger job. - client: make sure we write the state file after a failed RPC svn path=/trunk/boinc/; revision=17814	2009-04-15 06:22:53 +00:00
David Anderson	ed3e3b0063	- client: fix bug where if a GPU job is running, and a 2nd GPU job with an earlier deadline arrives, neither job is executed ever. Reorganized things so that scheduling of GPU jobs is done independently of CPU jobs. The policy for GPU jobs: - always EDF - jobs are always removed from memory, regardless of checkpoint (GPU memory is not paged, so it's bad to leave an idle app in memory) svn path=/trunk/boinc/; revision=17402	2009-02-26 21:36:41 +00:00

1 2 3 4 5

210 Commits