Commit Graph

226 Commits

Author SHA1 Message Date
David Anderson 53a550fef5 client, GUI RPC: maintain and report progress rate
"Progress rate" is that average increase in fraction done
per second of elapsed time.

Also remove unnecessary destructors in GUI RPC code
2015-06-22 00:09:15 -07:00
David Anderson daf5ddd580 client: fix bugs in task cleanup
There was at least one case where we weren't cleaning up
subsidiary processes (e.g. VMs) when a task's main process exited.

Fix this by consolidating task cleanup (shared mem and subsidiary processes)
in ACTIVE_TASK::cleanup_task().
This gets called when a tasks' main process exits.
2014-07-31 15:42:56 -07:00
David Anderson 55a17c3b41 client: fix logic for cleaning up subsidiary processes
"Subsidiary processes" are
1) descendants
2) "other PIDs" as reported by the app, e.g. VMs which are not descendants
We were failing to clean up these processes in some cases.

- Add a function ACTIVE_TASK::kill_exited_app() for killing the
  subsidiary processes of a task whose main process has already exited.
  At this point we can't enumerate its current descendants;
  but we have the list of descendants from the last time
  we computed memory usage (within the last 10 sec).
  So kill this, and kill the other PIDs
- call this function when appropriate:
  - too many temporary exits
  - too many premature exits
  - main process has exited in response to abort or quit message
    (the existing code failed to kill other PIDs)
- rename ACTIVE_TASK::kill_task() to kill_running_task()
  to emphasize its intended use.

Also remove code that, in case of secure install on Windows,
didn't try to kill any subsidiary processes at all;
there used to be a permission problem in doing so, now there isn't.
2014-07-23 12:41:57 -07:00
David Anderson f15f6d2ba0 API/client/vboxwrapper: show notice if need Vbox upgrade
Vboxwrapper detects known buggy versions of Vbox and calls
boinc_temporary_exit().
The "Incompatible version" message appears in the task status
in the BOINC Manager, where some users may never see it.
It needs to appear as a notice, telling the user to upgrade VBox.

To do this, I added an optional argument to boinc_temporary_exit()
saying that the message should be delivered as a notice.
This is conveyed to the client by adding
a line containing "notice" to the temp exit file.
I changed the client and vboxwrapper to use this.
2014-05-28 11:05:56 -07:00
David Anderson e5810f3061 client/server: change implementation of "exact fraction done".
My last commit did this using a new API call.
But this would require rebuilding apps any time you want to change it;
too much work.
So instead make it an attribute of apps,
which you can set via the admin web interface.

Corresponding changes to client.
2014-05-04 00:02:32 -07:00
David Anderson 77c4dd7b32 API/client: let apps say that fraction done is precise
Currently the duration estimate for a task is a combination of
- a static estimate, based on wu.rsc_fpops_est and the estimated FLOPS
- a dynamic estimate, based on fraction done (FD) and elapsed time
The weighting of the dynamic estimate is FD^2;
the assumption is that fraction done is imprecise and improves
toward the end of a task.

This isn't ideal for apps that can supply accurate FD.

Solution: add a new API function
boinc_fraction_done_exact().
This notifies the client that the FD is accurate,
and that it should use only the dynamic estimate.
(New clients will do this; old clients will use the FD as the currently do).
2014-05-02 23:11:34 -07:00
David Anderson 72d1369342 client: code shuffle; move GPU scheduling code to new file 2014-05-01 23:53:55 -07:00
David Anderson f8ee2e51fe client: keep track of a job's network usage, if it reports it.
If a job reports its network usage (via boinc_network_usage()),
keep track of this across episodes of the job, and report it to the server
(some projects may want to give credit for network usage).
2014-04-30 00:21:29 -07:00
David Anderson b1a6fa39fc Client: keep track of job's peak WSS, swap size, and disk usage; send to server
Also fixed a bug where, if a job was aborted while not running,
its final CPU and elapsed time weren't copied from ACTIVE_TASK to RESULT,
hence not sent to scheduler
2014-04-02 00:56:15 -07:00
David Anderson 9d056d60cb client: <cpu_sched> shouldn't show suspend/resume msgs for CPU throttling 2014-01-22 11:08:09 -08:00
David Anderson 45dfb684a6 Client: don't allow more than 1000 slot dirs.
There was a report of a situation where the client created unbounded slot dirs.
Not sure why this happened, but may as well impose a limit.
2013-10-23 21:37:24 -07:00
David Anderson 35f489d36f Client: debug sub-second CPU throttling 2013-09-20 23:18:33 -07:00
David Anderson ebde7809ce client: preliminary implementation (commented out) of sub-second throttling 2013-09-20 14:30:04 -07:00
David Anderson 6b5285ba04 client: more MAXPATHLEN fixes 2013-08-22 17:06:09 -07:00
David Anderson 519a0bcbef API: add test harness for the runtime system
- If you run the client with --run_test_app,
  runs "test_app" in the current directory and interacts with it
  (and does nothing else).
  It can suspend/resume it with arbitrary timing;
  this is controlled in run_test_app() (app_start.cpp).
- example app: add --critical_section option.
  This lets you test the runtime system for apps that do
  most of their work in a critical section (like GPU apps).
- Add some logging messages (conditioned by DEBUG_BOINC_API)
  to the runtime system.
- boinc_finish() waits for the timer thread to write final messages;
  make sure it doesn't do anything else
  (like suspend the worker thread) during this period
2013-07-04 16:00:10 -07:00
David Anderson 24e8133e4b - tabs -> spaces 2013-04-02 17:23:37 -07:00
David Anderson 79c6225fc2 - configure: work with "gold" linker 2013-03-05 13:33:27 +01:00
David Anderson 83211fd1c6 client: kill lingering apps
- client: if an app's finish file has existed for 10 seconds, kill it;
    it must be hung in boinc_finish().
    This behavior has been seen with LHC@home and maybe other projects.
2013-03-01 15:56:12 +01:00
David Anderson aa289f0916 - A bunch of tweaks from Steffen Moller, e.g. using MAXPATHLEN
svn path=/trunk/boinc/; revision=26133
2012-09-21 03:52:24 +00:00
David Anderson 72368a6b20 - A first attempt to fix the bug where apps die with exit(1)
(whereas they didn't do this w/ older clients).
    On Windows, the client uses TerminateProcess(h, 1) to kill processes;
    the 1 is the exit code the process will appear to have.

    So instead, add a "will_restart" bool arg to the various kill
    functions, and if set use 0 (= STATUS_SUCCESS),
    otherwise use EXIT_ABORTED_BY_CLIENT.

    Note: in principle this shouldn't make any difference
    for quitting tasks,
    since handle_exited_app() checks for task state QUIT_PENDING
    and ignores the exit code in that case.
    The only place I can see where it would make any difference
    is when we kill a process because it hasn't been handling
    queued shared-memory messages for 180 seconds.

- client: add more info to the message about an exited app

- client: function return values (ERR_*) are different from
    process exit codes (EXIT_*).
    But in many places we were using return values as exit codes.
    Fix these.
    Also, break out the different types of limits a job can exceed
    (time, disk, memory) into difference exit codes.


svn path=/trunk/boinc/; revision=25601
2012-04-26 05:28:45 +00:00
David Anderson bf393ad913 - client: if a job calls boinc_temporary_exit() 100 times, abort it.
Otherwise it could keep doing it forever
    (e.g. if there's not ever enough available GPU RAM)


svn path=/trunk/boinc/; revision=25483
2012-03-23 21:09:44 +00:00
David Anderson fc8191220f - client: change timeout for job quit/abort from 60 back to 15
(time between sending app a quit/abort message
    and, if not exited yet, killing it)
- client: if app has reported an "other PID"
    (e.g., vboxwrapper reports the VBoxHeadless PID)
    then include it (along with descendants) in the
    list of processes we kill when killing the job.


svn path=/trunk/boinc/; revision=25470
2012-03-21 20:30:14 +00:00
David Anderson ad232b2869 - client: report completed results if a time-of-day network suspend is
scheduled within the next 30 minutes


svn path=/trunk/boinc/; revision=25465
2012-03-20 19:37:04 +00:00
Rom Walton 25142dda02 - VBOX: Give the VM process a short priority boost when responding
to a quit request.  On older XP machines it might speed up the memory
        dump to disk.
    - client: Increase the quit request timeout from 10 seconds to 60 seconds for
        machines running VMs and slow disk drives.  It should give the VM enough
        time to gracefully shutdown and not give boinc reason to kill the wrapper.

    client/
        app.h
    samples/vboxwrapper/
        vbox.cpp, .h
        vboxwrapper.cpp

svn path=/trunk/boinc/; revision=25433
2012-03-16 01:04:43 +00:00
David Anderson 7c3bc68a05 - API, client, and Manager: add an optional "reason" argument to
boinc_temporary_exit(),
        explaining why the app is exiting.
        Convey this to the client, and then to the Manager,
        and display it there and in the log.

    clientgui/
        MainDocument.cpp
    lib/
        gui_rpc_client_ops.cpp
        gui_rpc_client.h
    api/
        boinc_api.cpp,h
    client/
        client_types.cpp,h
        app.h
        app_control.cpp

svn path=/trunk/boinc/; revision=25315
2012-02-22 22:56:05 +00:00
David Anderson 541f6dd1f3 - client: bug fix for async file ops:
set up files in slot dir when starting an app,
		whether or not it's the first time

svn path=/trunk/boinc/; revision=25221
2012-02-08 21:14:34 +00:00
David Anderson 4adba7ee4e - client: first pass at async file copy feature.
When a large file is copied from a project dir to a slot dir,
    it's copied in chunks,
    interleaved with other polling activities such as GUI RPCs.
    That way the manager doesn't freeze while large copies
    (e.g. VM images) are happening


svn path=/trunk/boinc/; revision=25192
2012-02-03 18:33:39 +00:00
David Anderson 81b29b0cc9 - API: fix queueing problem for graphics-related messages
(web graphics URL and remote desktop addr)
- GUI RPC and API:
    change "remote_desktop_connection" to "remote_desktop_addr" everywhere.
    It's an address, not a connection.
- vboxwrapper: log message cleanup


svn path=/trunk/boinc/; revision=25044
2012-01-13 19:00:16 +00:00
Rom Walton 3bc326db3e - client: Add plumbing to support passing the remote desktop
connection information to the manager
    - MGR: Add a "Show VM Console" button for those tasks which
        report a remote desktop port number.

    client/
        app.cpp, .h
        app_control.cpp
    clientgui/
        Events.h
        MainDocument.cpp, .h
        ViewWork.cpp, .h
    lib/
        gui_rpc_client.h
        gui_rpc_client_ops.cpp

svn path=/trunk/boinc/; revision=25036
2012-01-12 22:05:25 +00:00
David Anderson b003b8e290 - add support for APP::needs_network flag.
If set, don't run jobs for that app while network is suspended.
		- client: parse this flag and maintain in state file;
			do a job reschedule when network suspend state changes
		- GUI RPC: add RESULT::network_wait flag;
			if set, this job is waiting for network access to be allowed
		- Manager: display the above in task info
	- add support for "web graphics URL" (see above)
		- client: parse message containing URL on graphics_reply channel
			and store in ACTIVE_TASK::web_graphics_url
		- GUI RPC: add RESULT::web_graphics_url
		- Manager: if web graphics URL is present, Show Graphics opens a browser
	- remove some vestigial code for pre-V6 graphics

svn path=/trunk/boinc/; revision=24899
2011-12-26 03:30:32 +00:00
David Anderson 1d38837788 - client: call xp.skip_unexpected() if get unexpected tag,
to avoid showing multiple error messages
- client simulator: bug fixes and tweaks


svn path=/trunk/boinc/; revision=24408
2011-10-17 20:46:06 +00:00
David Anderson c7e505dc81 - client: fix memory leak when reading stderr of completed job.
This caused 128KB + size of stderr loss for each job.
- client: print error message if reading stderr fails
    (e.g. because of malloc failure)


svn path=/trunk/boinc/; revision=24336
2011-10-05 22:16:02 +00:00
David Anderson c61103ac26 - client: make the attributes of GUI RPCs (network, authentication)
explicit rather than determined by position in a list.
- client: add a new "read-only" attribute for GUI RPCs.
    This is in preparation for handling GUI RPCs in separate threads.
- client: remove code to support pre-V6 graphics.


svn path=/trunk/boinc/; revision=24232
2011-09-18 21:06:49 +00:00
David Anderson d8f20bceea - vboxwrapper: report network usage to the client
- client: include the above in enforcing network quota preferences


svn path=/trunk/boinc/; revision=24227
2011-09-16 19:16:12 +00:00
David Anderson 8b5344e922 - client: finish next-to-last checkin
svn path=/trunk/boinc/; revision=24157
2011-09-11 17:26:31 +00:00
David Anderson 4e946854c1 - client/API/vboxwrapper:
add a mechanism so that apps can report sub-processes
    that are not descendants (e.g., virtual machines)
    These processes are then counted as part of the app,
    not as "non-BOINC CPU time".
    This fixes a bug where processing was incorrectly suspended
    because CPU usage by VM apps exceeded the "CPU usage limit" pref.

    Implementation:
    - the PIDs of the processes in question
        are passed from app to client via shared-memory,
        in the app_status channel.
        A new variant of boinc_report_app_status() supports this.
    - the VBox wrapper queries the PID of the VM,
        and reports it in this way.
    - procinfo_app() includes a new argument: a list of PIDs
        that are part of the app, although not ancestrally
        related to the main process.
    - in the client, ACTIVE_TASK now includes a vector "other_pids".
        If this is nonempty, it's passed to procinfo_app().


svn path=/trunk/boinc/; revision=24123
2011-09-02 20:47:05 +00:00
David Anderson 609d5665cc - client: pass XML_PARSER& rather than MIOFILE& to parse functions.
Preparatory to using new-style XML parsing everywhere.


svn path=/trunk/boinc/; revision=23975
2011-08-09 21:44:14 +00:00
David Anderson 15c3ff7d31 - client: if an app version has nonempty file_prefix,
copy all its input and output files

svn path=/trunk/boinc/; revision=23925
2011-08-03 19:47:26 +00:00
David Anderson 5a8fd0afc7 - client: add optional <file_prefix> to APP_VERSION>.
If present, "file_prefix/" is prepended to the logical names
    of input and output files of jobs using that app version.
    I.e. for Vbox wrapper based app versions, file_prefix is "share",
    so that I/O files are put in a "share" subdirectory of the slot dir.
- update_versions: add support for
    <dont_throttle>
    <file_prefix>x</file_prefix>
    in version.xml


svn path=/trunk/boinc/; revision=23924
2011-08-03 18:14:45 +00:00
David Anderson c1bf16f7f3 - client: we were assuming that if we ask a task to exit
and its main process exits, everything is OK.
    That's not necessarily the case - buggy apps may have
    subprocesses that the main process fails to kill.

    Solution: when we request a task to exit or abort,
    make a list of the descendants.
    When the main process exits, kill any remaining descendants.
    
    Also: we weren't checking for the ABORT_PENDING case
    in the process exit logic.
    This may explain the 5/15 second delay in detaching or
    resetting a project with running tasks


svn path=/trunk/boinc/; revision=23738
2011-06-17 04:18:28 +00:00
David Anderson fa459d780c - client: fix bugs in runtime estimation of jobs that
have run before but are not currently running.
    Old:
    - We maintain the most recent fraction_done in state file.
        But for apps that checkpoint seldom or never,
        this is not the relevant value,
        and frac done may go down when the app runs.
    - fraction_done_elapsed_time is not initialized,
        and can have garbage values for jobs that haven't run yet.
    New:
    - Record, in the state file, the values of
        fraction_done and fraction_done_elapsed_time
        at the most recent checkpoint.
        When the client starts up, use these values.


svn path=/trunk/boinc/; revision=23455
2011-04-26 17:02:09 +00:00
David Anderson b89ea98838 - client: when estimating job runtime based on fraction done,
use the elapsed time when fraction done was last reported,
    not current elapsed time.
    Fix problem where est time remaining increases linearly,
    then abruptly decreases when new frac done is reported.
    From Bruce Allen.


svn path=/trunk/boinc/; revision=23373
2011-04-18 16:32:57 +00:00
David Anderson 28bad727c1 - client: when exclusive app mechanism is used (CPU or GPU)
wait for 30 secs after excl app exits
    before restarting computation


svn path=/trunk/boinc/; revision=23048
2011-02-16 20:41:19 +00:00
David Anderson 795e89dbf5 - client: eliminate unnecessary CPU reschedules.
Currently we do a reschedule any time a job checkpoints,
    in case there's a job that has finished a time slice
    but hasn't checkpointed yet.
    Instead: flag such jobs, and trigger a reschedule
    on checkpoint only for flagged jobs.
- client: fix instability in job scheduling that happens
    if a job's estimated completion time in RR sim is close to its deadline.
    It can alternate between making and missing deadline,
    causing the scheduler to alternate rapidly between jobs.
    Solution: if RR sim has marked a job as deadline miss
    any time in the last (CPU scheduling period),
    treat it as a deadline miss.


svn path=/trunk/boinc/; revision=22928
2011-01-19 16:46:55 +00:00
David Anderson 6478b3e05d - client: implement more scheduler changes that use
recent estimated credit (REC) instead of debt.
    These changes are enabled by
        #define USE_REC
    in work_fetch.h.
    If this is commented out (the default) the client uses
    debt-based scheduling, same as before.
    TODO: work-fetch policy changes
- client simulator: various fixes:
    - compute idle and wasted fraction based on all processing resources,
        not just CPU
    - compute job completion times based on FLOPS, not CPU seconds
    - compute and use project->no_X_apps
    etc.


svn path=/trunk/boinc/; revision=22741
2010-11-23 19:39:47 +00:00
David Anderson 8d9cf013c5 - client: account manager RPC:
Additions to request message:
        <not_started_dur>X</not_started_dur>
        <in_progress_dur>X</in_progress_dur>
        The estimated remaining duration of unstarted
        and in-progress tasks
    Additions to reply message, within <project>, optional:
        <suspend>0|1</suspend>
            suspend or resume project (overrides local state)
        <abort_not_started>0|1</abort_not_started>
            if set, abort unstarted jobs


svn path=/trunk/boinc/; revision=22698
2010-11-17 20:04:58 +00:00
David Anderson 082603f927 compile fix
svn path=/trunk/boinc/; revision=22410
2010-09-24 20:37:45 +00:00
David Anderson fcbb8a286e - client simulator: major remodel and upgrade.
Insteady of using its own XML input files,
    the simulator now takes a client_state.xml file as input.
    The simulator generates a synthetic workload based on the
    projects, apps, app versions, WUs, and result it finds there.

    This means that a user seeing aberrant behavior
    can just send their client_state.xml file
    and (hopefully) we can use the simulator to repro.

    The simulator now can model GPUs.

    As of this checkin, the simulator compiles but doesn't work.
    There should be no change in the actual client.


svn path=/trunk/boinc/; revision=22409
2010-09-24 20:02:42 +00:00
David Anderson 07b2830d93 - client: fix bug in accounting of elapsed time and CPU time
svn path=/trunk/boinc/; revision=21635
2010-05-25 18:48:53 +00:00
David Anderson 515abee470 - client/manager: keep track of "GPU suspended reason".
Report it to the manager
    (it was already in CC_STATUS, but not populated)
- manager: fix system tray icon popup text

svn path=/trunk/boinc/; revision=21481
2010-05-12 18:14:30 +00:00