runs GPU jobs in a seemingly random order,
or preempts GPU jobs needlessly.
The change has two parts:
1) sort the "results" vector by received_time,
so that the RR simulation processes GPU jobs FIFO.
2) in the CPU scheduler (earliest_deadline_result())
instead of choosing the earliest-deadline GPU job that
misses its deadline,
pick the earliest_deadline GPU from a project that
has a deadline miss for that GPU type
(this is what's done in the CPU case)
- client: fix bug where if you have an exclusive app,
then remove it from cc_config.xml and do "update config",
it doesn't go away.
Need to clear the list before parsing.
svn path=/trunk/boinc/; revision=18842
New approach: take the "ordered_schedule_results" list,
add running jobs that haven't finished their time slice,
and order the result appropriately.
Then run jobs in order until CPUs are filled.
Simpler and clearer than the old way.
svn path=/trunk/boinc/; revision=17992
and a 2nd GPU job with an earlier deadline arrives,
neither job is executed ever.
Reorganized things so that scheduling of GPU jobs is
done independently of CPU jobs.
The policy for GPU jobs:
- always EDF
- jobs are always removed from memory, regardless of checkpoint
(GPU memory is not paged, so it's bad to leave an idle app in memory)
svn path=/trunk/boinc/; revision=17402
- client: abort runaway jobs based on elapsed time instead of CPU time.
Specifically, abort jobs for which
elapsed time > WU.rsc_fpops_bound / app_version.flops
This policy works for
1) GPU jobs (which may use little CPU time)
2) jobs that run but because of bugs use little CPU time
(e.g., because they're sleeping)
whereas the old policy didn't.
svn path=/trunk/boinc/; revision=17399
stop accumulating debt if it's at or around zero.
This prevents other projects from being driven unboundedly negative.
- client: if the number of overworked projects exceeds the number
of device instances, clear debts; this indicates that an earlier
client was buggy and produced bad debt values.
svn path=/trunk/boinc/; revision=17325
This fixes a bug that can cause debts to NEVER get updated.
- client: added "abort_jobs_on_exit" feature
(available by --abort_jobs_on_exit cmdline
or <abort_jobs_on_exit> in cc_config.xml).
If set, when the client is exited by user request
(this includes signals on Unix)
it marks all pending jobs as aborted,
and does a scheduler RPC to all projects with jobs.
When these are completed the client exits.
This is useful when BOINC is being used on grids
where it is wiped clean after each run.
svn path=/trunk/boinc/; revision=17300
ignore intervals longer than 10 secs;
that could only happen if the client or host was suspended/hibernated.
- client: in adjust_debts(), ignore intervals longer than
2*work fetch period, not 2*CPU sched period.
adjust_debts() is called from work fetch.
svn path=/trunk/boinc/; revision=17154
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762
and clear all timeout variables.
This should fix the situation where, say:
1) the user sets the system clock forward by a year;
2) all projects get their min_rpc_time set;
3) the user sets the system clock back to the correct time.
Previously, BOINC would not do anything for a year.
Note: a restart of BOINC is required to fix things.
It would be harder to do this on the fly.
svn path=/trunk/boinc/; revision=15314
in <app_version>s from the server,
keep track of the number free of each type of coproc,
and don't run an app that needs more than are available.
(not quite working yet)
svn path=/trunk/boinc/; revision=14992
- update_versions: use __ (not :) as separator for plan class
- client: add plan_class to APP_VERSION;
an app version is now identified by platform/version/plan_class
- client CPU scheduler: don't assume apps use 1 CPU
- client: add avg_ncpus, max_cpus, flops, cmdline to RESULT
- scheduler: implement app planning scheme
Other changes:
- client: if symlink() fails, make a XML soft link instead
(for Unix running off a FAT32 FS)
- client: don't accept nonpositive resource share from AMS
- daemons and DB: check for error returns from enumerations,
and exit if so. Thus, if the MySQL server goes down,
all the daemons will soon exit.
The cron script will restart them every 5 min,
so when the DB server comes back up so will the project.
- web: show empty max CPU % as ---
- API: get rid of all_threads_cpu_time option (always the case now)
svn path=/trunk/boinc/; revision=14966
Specifies an amount of time to delay starting apps
(e.g. so that BOINC doesn't slow down boot process)
Note: mechanisms that start BOINC at boot time
need to figure out how to set this flag.
svn path=/trunk/boinc/; revision=14445
(wasn't implemented anyway)
- client: added <report_results_immediately> configuration flag;
causes results to be reported as soon as done.
Needed for some WCG machines that are reformatted often.
Should NOT be used in general, since it increases server load.
svn path=/trunk/boinc/; revision=14280
- move client sandbox-specific code to a new file, sandbox.C
- remove g_use_sandbox from util.C; move to MainDocument.cpp (manager)
and sandbox.C (client)
- don't declare check_security() in util.h; it's not in util.C
- don't call remove_project_owned_file_or_dir() in
boinc_delete_file_aux() or boinc_rmdir();
rather, at the points in the client that delete
dirs that are usually owned by boinc_projects,
call remove_project_owned_file_or_dir() first,
then clean_out_dir().
- rename boinc_exec() to switcher_exec() and move it to sandbox.C
Note: this change was sparked by needing to remove a call to getgrnam()
from boinclib, to avoid requiring the same version of glibc
on both compile and target hosts
svn path=/trunk/boinc/; revision=13784
there's a single GUI_HTTP object,
and it works only if used sequentially,
i.e. an op is started only after the previous one ends.
This breaks if a GUI RPC triggerse and op while
a project-list fetch (initiated by the client itself) is in progress.
Or if two managers are connected at the same time,
and both do HTTP ops.
The solution: have a separate GUI_HTTP object for each GUI_RPC_CONN,
and an additional one for use by the client itself.
svn path=/trunk/boinc/; revision=13692
(deciding which app to use, implementing blanking interval, etc.)
This logic is all now in the screensaver itself.
- GUI RPC: removed get/set screensaver mode RPCs
- API: added a "backwards_compatible_graphics" flag to BOINC_OPTIONS.
V6 apps should set this.
If set, the runtime library checks for graphics messages
from the client, and launches/kills the graphics app (if any).
The app will then work graphically with pre-V6 clients.
- removed some old files
svn path=/trunk/boinc/; revision=13651
that lets you suspend computation after a specified period of idleness.
This is necessary to allow some machines to go into low-power mode
when they're not being used.
- Change the wording of some existing prefs; for example, changed
"Do work while computer is in use?" to
"Suspend work while computer is in use".
The former is confusing - if you say yes, BOINC may in fact
NOT do work while the computer is in use,
due to other factors (time of day, etc.)
TODO: HOST_INFO::users_idle() should be changed so that it
returns the idle time
(rather than telling you whether we've been idle for X)
svn path=/trunk/boinc/; revision=13193
but don't do anything other than handle GUI RPCs.
After 50 secs, print an "about to exit" message.
After 60 seconds, exit
svn path=/trunk/boinc/; revision=13162
This cause the core client to exit immediately before or after
running a job,
letting you examine the contents of the slot directory.
- scheduler: changed max # of CPUs used in daily_result_quota
limit from 4 to 8, and make it a compile-time parameter
- feeder/scheduler: make the number of work items in shared
memory configurable (in config.xml).
The element is <shmem_work_items>
- feeder: make the size of the work item query configurable
(<feeder_query_size)
- feeder: remove code related to removing infeasible results
from shared mem.
This mechanism was never needed,
and I think a timeout would accomplish the same effect.
client/
app.C
app_start.C
client_state.C,h
cs_cmdline.C
sched/
feeder.C
sched_array.C
sched_config.C,h
sched_send.C
sched_shmem.C,h
sched_util.C
show_shmem.C
svn path=/trunk/boinc/; revision=12771
1) client wakes up from hibernate
2) one or more network ops start (e.g. because backoff expired)
3) ops fail because DNS system isn't up yet
4) connect to reference site fails too
5) user sees "please create physical connection",
even though there's been a physical connection the whole time.
Solution:
- keep track of "last wakeup time": the last time the
time of day (measured in poll_slow_events()) increased by more than 10 times the polling interval.
This must be either coming out of hibernation,
or the user resetting the system clock.
- When a network operation fails, try to contact the reference site
only if it's more than 30 seconds after the last wakeup time.
client/
client_state.C,h
net_stats.C
svn path=/trunk/boinc/; revision=12665
if the initial sched request failed,
the manager would show "communicating" for 60 sec,
then time out and show "failed to attach".
But the project would actually be attached.
This was due to a logic error,
but I fixed it in a more fundamental way:
by considering an attach to be complete immediately,
without waiting for a successful scheduler RPC.
This was originally done to ensure that the URL and account key were valid.
But when using the BOINC Manager, we've already verified
both of these before doing the attach project RPC.
When using boinc_cmd, you now have to check for messages
indicating a bad URL or account key.
I changed things to print these messages on every sched RPC.
Implementation: the notion of "tentative project" no longer exists.
client/
client_state.C,h
client_types.C,h
cs_account.C
cs_benchmarks.C
cs_scheduler.C
gui_rpc_server_ops.C
scheduler_op.C
sim.C
sim_util.C
svn path=/trunk/boinc/; revision=12663