Currently we do a reschedule any time a job checkpoints,
in case there's a job that has finished a time slice
but hasn't checkpointed yet.
Instead: flag such jobs, and trigger a reschedule
on checkpoint only for flagged jobs.
- client: fix instability in job scheduling that happens
if a job's estimated completion time in RR sim is close to its deadline.
It can alternate between making and missing deadline,
causing the scheduler to alternate rapidly between jobs.
Solution: if RR sim has marked a job as deadline miss
any time in the last (CPU scheduling period),
treat it as a deadline miss.
svn path=/trunk/boinc/; revision=22928
My change of 1 Oct ([22440]) required that such jobs
be processed with 64-bit apps,
on the assumption that 32-bit apps have a 2 GB user address space limit.
However, it turns out this limit applies only to Windows
(kernel and user mode share the 4GB address space; each gets half).
On Linux, the split is 3GB user / 1 GB kernel.
On Mac OS X, user mode and kernel mode have separate address spaces,
each of them 4 GB.
svn path=/trunk/boinc/; revision=22599
don't include it in non-BOINC CPU time.
Otherwise the presence of such a process could
prevent BOINC from running apps.
(Windows only - will do Unix/Mac later)
svn path=/trunk/boinc/; revision=22422
Old: when a job finished, we cleared the backoffs for the
resources it used. The idea was to get more jobs
immediately in the case where the client was at
a jobs-in-progress limit.
Problem: this resulted in an RPC immediately,
typically before the output files were uploaded.
So the client is still at the limit, and doesn't get jobs.
New: clear the backoffs at the point when output files
have been uploaded and the job is ready to report.
- client: change range in resource backoff from (0,x) to (.5, 1.5*x)
svn path=/trunk/boinc/; revision=22411
Insteady of using its own XML input files,
the simulator now takes a client_state.xml file as input.
The simulator generates a synthetic workload based on the
projects, apps, app versions, WUs, and result it finds there.
This means that a user seeing aberrant behavior
can just send their client_state.xml file
and (hopefully) we can use the simulator to repro.
The simulator now can model GPUs.
As of this checkin, the simulator compiles but doesn't work.
There should be no change in the actual client.
svn path=/trunk/boinc/; revision=22409
allow for the possibility that suspended BOINC apps
aren't really suspended
(e.g. multithread apps that don't use boinc_init_parallel())
- client: message tweak
svn path=/trunk/boinc/; revision=22388
and exclusive GPU apps
- client: fix bug that caused GPU apps to not be
suspended or resumed immediately after
exclusive GPU app transition
- client: in log message, instead of saying
"fetching tasks for GPU", say which kind of GPU
svn path=/trunk/boinc/; revision=22298
Report it to the manager
(it was already in CC_STATUS, but not populated)
- manager: fix system tray icon popup text
svn path=/trunk/boinc/; revision=21481
so that we can look for memory leaks.
- client: enable bandwidth quota limit only if both
#MB and #days are nonzero.
- scheduler: when resending work, don't send more than
client is requesting
- scheduler: restore Cobblestone factor to 100
svn path=/trunk/boinc/; revision=21460
rather than TerminateProcessById().
The latter doesn't work in protected mode.
- client: pid_handle => process_handle. misnomer
svn path=/trunk/boinc/; revision=21272
Removed my changes of 19 Jan 2010, which didn't work.
Added new mechanism: keep track of whether a job J has ever run in EDF.
If so, and if another job of the same project and resource type as J
is marked as deadline miss, then mark J as deadline miss,
so that it won't get preempted.
- web: change "result" to "task" in server status page
- admin web: show server stable SVN revision, not trunk
svn path=/trunk/boinc/; revision=20805
always remove it from memory, even if it hasn't checkpointed.
Otherwise we'll typically run another GPU job right away,
and it will bomb out or revert to CPU mode because it
can't allocate video RAM
svn path=/trunk/boinc/; revision=18503
when they're preempting another GPU job.
The problem was as follows:
- job A is chosen to preempt job B
- we tell job B to quit, and initialize job A but don't start it;
however, we set if scheduler state to SCHEDULED
(rather than UNINITIALIZED)
- job B exits, and we start job A.
Since its state is not UNITIALIZED, we don't set up its slot dir.
- job A runs in an empty slot dir, doesn't find its files, and bombs out.
- client: add <slot_debug> option (prints messages about
allocation of slots, creating/removing files in slot dirs).
svn path=/trunk/boinc/; revision=18217
Instead, write the info into a file in the slot directory,
and check for these files on startup.
This should reduce the overhead of state-file writing
on machines with lots of cores.
There will still be a flurry of writes each time a job finishes,
but reducing that overhead would be a larger job.
- client: make sure we write the state file after a failed RPC
svn path=/trunk/boinc/; revision=17814
clear the project's backoff for its resource type.
This fixes a problem where a project has a "max jobs in progress"
limit, and we're backed off because of that.
We'll now fetch work immediately instead of waiting 24 hrs.
svn path=/trunk/boinc/; revision=17665
and a 2nd GPU job with an earlier deadline arrives,
neither job is executed ever.
Reorganized things so that scheduling of GPU jobs is
done independently of CPU jobs.
The policy for GPU jobs:
- always EDF
- jobs are always removed from memory, regardless of checkpoint
(GPU memory is not paged, so it's bad to leave an idle app in memory)
svn path=/trunk/boinc/; revision=17402
- client: abort runaway jobs based on elapsed time instead of CPU time.
Specifically, abort jobs for which
elapsed time > WU.rsc_fpops_bound / app_version.flops
This policy works for
1) GPU jobs (which may use little CPU time)
2) jobs that run but because of bugs use little CPU time
(e.g., because they're sleeping)
whereas the old policy didn't.
svn path=/trunk/boinc/; revision=17399
even if it doesn't use a coprocessor.
- scheduler: added an "nci" (non CPU intensive) plan class
to sched_plan.cpp. It declares the use of 1% of a CPU.
The above two changes are intended to allow the QCN app to
run at above_idle priority, which it needs in order to do 500Hz polling.
- API: the std::string version of boinc_resolve_filename()
acts the same as the char[] version.
svn path=/trunk/boinc/; revision=16985