A while back we added a mechanism intended to defer work-request RPCs
while file uploads are happening,
with the goal of reporting completed tasks sooner
and reducing the number of RPCs.
There were 2 bugs in this mechanism.
First, the decision of whether an upload is active was flawed;
if several uploads were active and 1 finished,
it would act like all had finished.
Second, when WORK_FETCH::choose_project.cpp() picks a project,
it sets p->sched_rpc_pending to RPC_REASON_NEED_WORK.
If we then decide not to request work because an upload
is active, we need to clear this field.
Otherwise scheduler_rpc_poll() will do an RPC to it,
piggybacking a work request and bypassing the upload check.
(usually in a static variable called "last_time")
of the last time we did something,
and we only do it again when now - last_time exceeds some interval.
Example: sending heartbeat messages to apps.
Problem: if the system clock is decreased by X,
we won't do any of these actions are time X,
making it appear that the client is frozen.
Solution: when we detect that the system clock has decreased,
set a global var "clock_change" for 1 iteration of the polling loop,
and disable these time checks if clock_change is set.
(whereas they didn't do this w/ older clients).
On Windows, the client uses TerminateProcess(h, 1) to kill processes;
the 1 is the exit code the process will appear to have.
So instead, add a "will_restart" bool arg to the various kill
functions, and if set use 0 (= STATUS_SUCCESS),
otherwise use EXIT_ABORTED_BY_CLIENT.
Note: in principle this shouldn't make any difference
for quitting tasks,
since handle_exited_app() checks for task state QUIT_PENDING
and ignores the exit code in that case.
The only place I can see where it would make any difference
is when we kill a process because it hasn't been handling
queued shared-memory messages for 180 seconds.
- client: add more info to the message about an exited app
- client: function return values (ERR_*) are different from
process exit codes (EXIT_*).
But in many places we were using return values as exit codes.
Fix these.
Also, break out the different types of limits a job can exceed
(time, disk, memory) into difference exit codes.
svn path=/trunk/boinc/; revision=25601
File verify is done in 4 places:
- after a download finishes
- transition result to DOWNLOADED
- if project->verify_files_on_app_start, on app start
Use asynchrony only in the first 2 cases,
since the async logic is set up to mark the file as PRESENT
when done, not to restart a task
svn path=/trunk/boinc/; revision=25219
as described here: http://boinc.berkeley.edu/trac/wiki/ClientDataModel
Compatibility:
clients that upgrade to this version should see nothing unusual.
Clients that downgrade from this version to a previous version
should see all projects reset
(i.e. tasks disappear and then get re-downloaded).
- manager: always show whether a file transfer is upload or download
- client: don't scale work requests by resource share
svn path=/trunk/boinc/; revision=23862
Insteady of using its own XML input files,
the simulator now takes a client_state.xml file as input.
The simulator generates a synthetic workload based on the
projects, apps, app versions, WUs, and result it finds there.
This means that a user seeing aberrant behavior
can just send their client_state.xml file
and (hopefully) we can use the simulator to repro.
The simulator now can model GPUs.
As of this checkin, the simulator compiles but doesn't work.
There should be no change in the actual client.
svn path=/trunk/boinc/; revision=22409
(otherwise it doesn't work for coproc or multi-proc apps)
- client: in estimate of job completion time,
weight the estimate based on fraction done more heavily
(quadratic rather than linear)
svn path=/trunk/boinc/; revision=16603