- a project overestimates job FLOP counts
- the client starts jobs in EDF mode
- as job progresses and fraction done increases,
its completion time estimate decreases until
it's no longer a deadline miss.
- job gets preempted by other job from that project;
you end up with lots of partly completed jobs.
Solution (I hope): if an app version has running jobs,
compute a "temp DCF" for the app version,
which is the min of dynamic/static estimates for its jobs.
Apply this scaling factor to completion time estimates
for unstarted jobs in RR simulation
- client: the estimation of remaining time of running jobs was wrong
(how did this bug survive so long?)
svn path=/trunk/boinc/; revision=20077
This exits the app with status zero and no finish file,
so the client will restart it.
It creates a file "temporary_exit" containing dt.
The (new) client reads this file and will postpone
scheduling the job again for dt seconds.
Old clients will treat it as a premature exit,
and potentially try to reschedule the job immediately.
This function is intended for GPU applications that
fail to allocate GPU RAM,
presumably because a non-GPU application has it allocated.
We don't want the job to fail,
and we want to wait for a while before trying the allocation again.
svn path=/trunk/boinc/; revision=19879
ones already running.
The problem: we considered a job as started if it has an ACTIVE_TASK.
However, we were creating ACTIVE_TASKS for jobs before deciding
to run them, because we needed a place to store the coproc reservations.
This caused the above bug, and also had the undesirable effect
of creating slot directories before they're needed.
Solution: store coprocessor reservations in RESULT
rather than ACTIVE_TASK.
svn path=/trunk/boinc/; revision=19129
- different data structure for keeping track of coproc usage;
instead of COPROC having per-instance pointers to ACTIVE_TASK,
ACTIVE_TASK now has an array of device number indices
for each instance that it's using.
- in enforce_schedule(), we call a new function assign_coprocs()
that decides what coproc instances each job will use,
and prunes jobs for which we can't get an assignment.
This function embodies lots of subtlety.
- coproc_cmdline() no longer deals with reserving instances;
it just has to generate the --device X cmdline
svn path=/trunk/boinc/; revision=18880
when they're preempting another GPU job.
The problem was as follows:
- job A is chosen to preempt job B
- we tell job B to quit, and initialize job A but don't start it;
however, we set if scheduler state to SCHEDULED
(rather than UNINITIALIZED)
- job B exits, and we start job A.
Since its state is not UNITIALIZED, we don't set up its slot dir.
- job A runs in an empty slot dir, doesn't find its files, and bombs out.
- client: add <slot_debug> option (prints messages about
allocation of slots, creating/removing files in slot dirs).
svn path=/trunk/boinc/; revision=18217
Instead, write the info into a file in the slot directory,
and check for these files on startup.
This should reduce the overhead of state-file writing
on machines with lots of cores.
There will still be a flurry of writes each time a job finishes,
but reducing that overhead would be a larger job.
- client: make sure we write the state file after a failed RPC
svn path=/trunk/boinc/; revision=17814
and a 2nd GPU job with an earlier deadline arrives,
neither job is executed ever.
Reorganized things so that scheduling of GPU jobs is
done independently of CPU jobs.
The policy for GPU jobs:
- always EDF
- jobs are always removed from memory, regardless of checkpoint
(GPU memory is not paged, so it's bad to leave an idle app in memory)
svn path=/trunk/boinc/; revision=17402
- client: abort runaway jobs based on elapsed time instead of CPU time.
Specifically, abort jobs for which
elapsed time > WU.rsc_fpops_bound / app_version.flops
This policy works for
1) GPU jobs (which may use little CPU time)
2) jobs that run but because of bugs use little CPU time
(e.g., because they're sleeping)
whereas the old policy didn't.
svn path=/trunk/boinc/; revision=17399
which of those files to include
- Modified MAC address check to work on some non-Linux unixes.
(mac_address.cpp)
- Added suggested change to "already attached to project" checking.
(ProjectInfoPage.cpp)
- changed includes of standard c header files to their c++ equivalents
(i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
- replaced "using namespace std;" with more explicit "using std::function" in
several files.
- Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
to the build environment. (boinc_platform.m4,configure.ac)
- Changed build environment to not use -nostandardlibs unless we are using
G++ and static linkage is specified. (configure.ac)
- Added makefiles and package building files for solaris CSW package manager.
- Fixed bug with attempting to find login name using logname. (configure.ac)
- Added ifdef HAVE_* protection around some include files commonly found in
sys.
- Added support for unified binary for x86_64/i686-pc-solaris.
(cs_platforms.cpp)
- generate_host_cpid() now uses MAC address on non-linux unix.
(hostinfo_network.cpp)
- Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
compilers. (boinc_set_compile_flags.m4)
- Library compiles no longer depend upon the library extension or require
the library to be prefixed with lib.
- More fixes for fcgi builds.
- Added declaration of "struct ether_addr" and ether_ntoa(). Have not yet
implemented ether_ntoa() for machines that don't have it, or where it is
buggy. (unix_util.h)
- Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
- Fixed library Makefiles so that all required headers get installed.
svn path=/trunk/boinc/; revision=17388
1) it uses a coprocessor
2) it has checkpointed since the client started
3) it's being preempted because of a user action
(suspend job, project, or all processing)
or user preference (time of day, computer in use)
- scheduler: if shared mem seg doesn't exist,
report it and don't crash
svn path=/trunk/boinc/; revision=16992
as the basis for estimating job completion times.
This should improve estimates for GPU apps,
and prevent the DCF from getting messed up.
svn path=/trunk/boinc/; revision=16598
<exclusive_app>foo.exe</exclusive_app>
in your cc_config.xml, BOINC will suspend computing
whenever foo.exe is running (e.g., a game).
Eventually we might want to put the interface in preferences
instead of cc_config.xml
svn path=/trunk/boinc/; revision=16087
- client: don't leak process handles when abort jobs
- client: if an app exits or we kill it, always destroy the shmem segment.
- web: more HTML 4.01 Transitional conformity changes
svn path=/trunk/boinc/; revision=15865
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762
and it doesn't do so within 10 seconds, kill it.
This deals with the situation where the app is ignoring messages
(e.g. because it forgot to end a critical section).
- client: if either the FP or int benchmark runs less than
3 CPU seconds (out of 10 seconds of wall time) ignore the benchmark.
This is an effort to deal with a problem where (for unknown reasons)
the int benchmark runs for a tiny amount of CPU time,
leading to an absurdly large result
- Manager: don't prepend "[error]" to MSG_INTERNAL_ERROR messages;
the client already does this.
THESE ARE ALL BUG FIXES
svn path=/trunk/boinc/; revision=15128
- update_versions: use __ (not :) as separator for plan class
- client: add plan_class to APP_VERSION;
an app version is now identified by platform/version/plan_class
- client CPU scheduler: don't assume apps use 1 CPU
- client: add avg_ncpus, max_cpus, flops, cmdline to RESULT
- scheduler: implement app planning scheme
Other changes:
- client: if symlink() fails, make a XML soft link instead
(for Unix running off a FAT32 FS)
- client: don't accept nonpositive resource share from AMS
- daemons and DB: check for error returns from enumerations,
and exit if so. Thus, if the MySQL server goes down,
all the daemons will soon exit.
The cron script will restart them every 5 min,
so when the DB server comes back up so will the project.
- web: show empty max CPU % as ---
- API: get rid of all_threads_cpu_time option (always the case now)
svn path=/trunk/boinc/; revision=14966
This is for debugging apps (currently works only in Unix).
What it does: when running an app,
the client does everything except actually fork/exec the app,
i.e. it sets up the slot dir, creates shared mem segment etc.
It then continues as if the app were actually running,
and you can then manually run your app under a debugger
in the slot directory.
Note: the client won't notice the termination of your app.
- API, Unix: in situations where the timer thread wants to exit
(e.g. it notices a missing heartbeat).
don't directly call boinc_exit(),
since this touches data structures that the worker thread
may be using concurrently.
Instead, set a flag telling the worker thread to call boinc_exit()
(which it will do from its signal handler)
This is an attempt to fix problems reported by Bernd;
I haven't tested it.
- scheduler: add config flag for uploading usage data
- web: show account key and weak account key on user page
- added some code for multithread support (not finished)
api/
boinc_api.C
svn path=/trunk/boinc/; revision=14542
removed references to "graphics thread"
removed HANDLE timer_quit_event
removed enable_heartbeat/disable_heartbeat messages
(not sure what the ideas was, but no longer exists)
removed heartbeat_active flag (use options.check_heartbeat instead)
read heartbeat-channel messages even if heartbeat disabled
(since we use that channel for WSS messages too)
- client: remove ACTIVE_TASK::thread_handle (not used)
svn path=/trunk/boinc/; revision=14323
error. 0xC0000142 means STATUS_DLL_INIT_FAILED which can happen
when an application attempts to create a new process while the OS
is shutting down, and when the desktop heap is fully utilized.
This will keep an app from erroring out during Vista's shutdown
sequence. Only a reboot can fix the desktop heap.
client/
app.h
app_control.C
svn path=/trunk/boinc/; revision=14080
(deciding which app to use, implementing blanking interval, etc.)
This logic is all now in the screensaver itself.
- GUI RPC: removed get/set screensaver mode RPCs
- API: added a "backwards_compatible_graphics" flag to BOINC_OPTIONS.
V6 apps should set this.
If set, the runtime library checks for graphics messages
from the client, and launches/kills the graphics app (if any).
The app will then work graphically with pre-V6 clients.
- removed some old files
svn path=/trunk/boinc/; revision=13651
between completing a result and reporting it.
- back end: added <httpd_user> config option:
Web server user name (used by file deleter)
- back end: don't report unparsed XML in config.xml as an error
client/
app.h
work_fetch.C
lib/
shmem.C
sched/
file_deleter.C
file_upload_handler.C
sched_config.C,h
show_shmem.C
svn path=/trunk/boinc/; revision=13039
These let the Manager run the graphics app.
Graphics apps have physical name *v6graphics*
- Separated ACTIVE_TASK::write() and ACTIVE_TASK::write_gui().
These need to write largely disjoint set of items.
- code cleanup: remove a zillion "else"s in parsing code
- code cleanup: change a zillion match_tag(buf, "<foo/>"
to parse_bool(buf, "foo")
client/
app.C,h
client_state.C
client_types.C,h
lib/
gui_rpc_client.h
gui_rpc_client_ops.C
sched/
server_types.C
svn path=/trunk/boinc/; revision=12938