When a large file is copied from a project dir to a slot dir,
it's copied in chunks,
interleaved with other polling activities such as GUI RPCs.
That way the manager doesn't freeze while large copies
(e.g. VM images) are happening
svn path=/trunk/boinc/; revision=25192
connection information to the manager
- MGR: Add a "Show VM Console" button for those tasks which
report a remote desktop port number.
client/
app.cpp, .h
app_control.cpp
clientgui/
Events.h
MainDocument.cpp, .h
ViewWork.cpp, .h
lib/
gui_rpc_client.h
gui_rpc_client_ops.cpp
svn path=/trunk/boinc/; revision=25036
If set, don't run jobs for that app while network is suspended.
- client: parse this flag and maintain in state file;
do a job reschedule when network suspend state changes
- GUI RPC: add RESULT::network_wait flag;
if set, this job is waiting for network access to be allowed
- Manager: display the above in task info
- add support for "web graphics URL" (see above)
- client: parse message containing URL on graphics_reply channel
and store in ACTIVE_TASK::web_graphics_url
- GUI RPC: add RESULT::web_graphics_url
- Manager: if web graphics URL is present, Show Graphics opens a browser
- remove some vestigial code for pre-V6 graphics
svn path=/trunk/boinc/; revision=24899
This caused 128KB + size of stderr loss for each job.
- client: print error message if reading stderr fails
(e.g. because of malloc failure)
svn path=/trunk/boinc/; revision=24336
explicit rather than determined by position in a list.
- client: add a new "read-only" attribute for GUI RPCs.
This is in preparation for handling GUI RPCs in separate threads.
- client: remove code to support pre-V6 graphics.
svn path=/trunk/boinc/; revision=24232
add a mechanism so that apps can report sub-processes
that are not descendants (e.g., virtual machines)
These processes are then counted as part of the app,
not as "non-BOINC CPU time".
This fixes a bug where processing was incorrectly suspended
because CPU usage by VM apps exceeded the "CPU usage limit" pref.
Implementation:
- the PIDs of the processes in question
are passed from app to client via shared-memory,
in the app_status channel.
A new variant of boinc_report_app_status() supports this.
- the VBox wrapper queries the PID of the VM,
and reports it in this way.
- procinfo_app() includes a new argument: a list of PIDs
that are part of the app, although not ancestrally
related to the main process.
- in the client, ACTIVE_TASK now includes a vector "other_pids".
If this is nonempty, it's passed to procinfo_app().
svn path=/trunk/boinc/; revision=24123
If present, "file_prefix/" is prepended to the logical names
of input and output files of jobs using that app version.
I.e. for Vbox wrapper based app versions, file_prefix is "share",
so that I/O files are put in a "share" subdirectory of the slot dir.
- update_versions: add support for
<dont_throttle>
<file_prefix>x</file_prefix>
in version.xml
svn path=/trunk/boinc/; revision=23924
and its main process exits, everything is OK.
That's not necessarily the case - buggy apps may have
subprocesses that the main process fails to kill.
Solution: when we request a task to exit or abort,
make a list of the descendants.
When the main process exits, kill any remaining descendants.
Also: we weren't checking for the ABORT_PENDING case
in the process exit logic.
This may explain the 5/15 second delay in detaching or
resetting a project with running tasks
svn path=/trunk/boinc/; revision=23738
have run before but are not currently running.
Old:
- We maintain the most recent fraction_done in state file.
But for apps that checkpoint seldom or never,
this is not the relevant value,
and frac done may go down when the app runs.
- fraction_done_elapsed_time is not initialized,
and can have garbage values for jobs that haven't run yet.
New:
- Record, in the state file, the values of
fraction_done and fraction_done_elapsed_time
at the most recent checkpoint.
When the client starts up, use these values.
svn path=/trunk/boinc/; revision=23455
use the elapsed time when fraction done was last reported,
not current elapsed time.
Fix problem where est time remaining increases linearly,
then abruptly decreases when new frac done is reported.
From Bruce Allen.
svn path=/trunk/boinc/; revision=23373
Currently we do a reschedule any time a job checkpoints,
in case there's a job that has finished a time slice
but hasn't checkpointed yet.
Instead: flag such jobs, and trigger a reschedule
on checkpoint only for flagged jobs.
- client: fix instability in job scheduling that happens
if a job's estimated completion time in RR sim is close to its deadline.
It can alternate between making and missing deadline,
causing the scheduler to alternate rapidly between jobs.
Solution: if RR sim has marked a job as deadline miss
any time in the last (CPU scheduling period),
treat it as a deadline miss.
svn path=/trunk/boinc/; revision=22928
recent estimated credit (REC) instead of debt.
These changes are enabled by
#define USE_REC
in work_fetch.h.
If this is commented out (the default) the client uses
debt-based scheduling, same as before.
TODO: work-fetch policy changes
- client simulator: various fixes:
- compute idle and wasted fraction based on all processing resources,
not just CPU
- compute job completion times based on FLOPS, not CPU seconds
- compute and use project->no_X_apps
etc.
svn path=/trunk/boinc/; revision=22741
Additions to request message:
<not_started_dur>X</not_started_dur>
<in_progress_dur>X</in_progress_dur>
The estimated remaining duration of unstarted
and in-progress tasks
Additions to reply message, within <project>, optional:
<suspend>0|1</suspend>
suspend or resume project (overrides local state)
<abort_not_started>0|1</abort_not_started>
if set, abort unstarted jobs
svn path=/trunk/boinc/; revision=22698
Insteady of using its own XML input files,
the simulator now takes a client_state.xml file as input.
The simulator generates a synthetic workload based on the
projects, apps, app versions, WUs, and result it finds there.
This means that a user seeing aberrant behavior
can just send their client_state.xml file
and (hopefully) we can use the simulator to repro.
The simulator now can model GPUs.
As of this checkin, the simulator compiles but doesn't work.
There should be no change in the actual client.
svn path=/trunk/boinc/; revision=22409
Report it to the manager
(it was already in CC_STATUS, but not populated)
- manager: fix system tray icon popup text
svn path=/trunk/boinc/; revision=21481
rather than TerminateProcessById().
The latter doesn't work in protected mode.
- client: pid_handle => process_handle. misnomer
svn path=/trunk/boinc/; revision=21272
favor those that are partially done
- client: fix crashing bug if a project is detached
while an RSS feed fetch for it is in progress
- code cleanup: switch from /// back to // for comments
(so much for doxygen)
svn path=/trunk/boinc/; revision=21041
Removed my changes of 19 Jan 2010, which didn't work.
Added new mechanism: keep track of whether a job J has ever run in EDF.
If so, and if another job of the same project and resource type as J
is marked as deadline miss, then mark J as deadline miss,
so that it won't get preempted.
- web: change "result" to "task" in server status page
- admin web: show server stable SVN revision, not trunk
svn path=/trunk/boinc/; revision=20805
- a project overestimates job FLOP counts
- the client starts jobs in EDF mode
- as job progresses and fraction done increases,
its completion time estimate decreases until
it's no longer a deadline miss.
- job gets preempted by other job from that project;
you end up with lots of partly completed jobs.
Solution (I hope): if an app version has running jobs,
compute a "temp DCF" for the app version,
which is the min of dynamic/static estimates for its jobs.
Apply this scaling factor to completion time estimates
for unstarted jobs in RR simulation
- client: the estimation of remaining time of running jobs was wrong
(how did this bug survive so long?)
svn path=/trunk/boinc/; revision=20077
This exits the app with status zero and no finish file,
so the client will restart it.
It creates a file "temporary_exit" containing dt.
The (new) client reads this file and will postpone
scheduling the job again for dt seconds.
Old clients will treat it as a premature exit,
and potentially try to reschedule the job immediately.
This function is intended for GPU applications that
fail to allocate GPU RAM,
presumably because a non-GPU application has it allocated.
We don't want the job to fail,
and we want to wait for a while before trying the allocation again.
svn path=/trunk/boinc/; revision=19879
ones already running.
The problem: we considered a job as started if it has an ACTIVE_TASK.
However, we were creating ACTIVE_TASKS for jobs before deciding
to run them, because we needed a place to store the coproc reservations.
This caused the above bug, and also had the undesirable effect
of creating slot directories before they're needed.
Solution: store coprocessor reservations in RESULT
rather than ACTIVE_TASK.
svn path=/trunk/boinc/; revision=19129
- different data structure for keeping track of coproc usage;
instead of COPROC having per-instance pointers to ACTIVE_TASK,
ACTIVE_TASK now has an array of device number indices
for each instance that it's using.
- in enforce_schedule(), we call a new function assign_coprocs()
that decides what coproc instances each job will use,
and prunes jobs for which we can't get an assignment.
This function embodies lots of subtlety.
- coproc_cmdline() no longer deals with reserving instances;
it just has to generate the --device X cmdline
svn path=/trunk/boinc/; revision=18880
when they're preempting another GPU job.
The problem was as follows:
- job A is chosen to preempt job B
- we tell job B to quit, and initialize job A but don't start it;
however, we set if scheduler state to SCHEDULED
(rather than UNINITIALIZED)
- job B exits, and we start job A.
Since its state is not UNITIALIZED, we don't set up its slot dir.
- job A runs in an empty slot dir, doesn't find its files, and bombs out.
- client: add <slot_debug> option (prints messages about
allocation of slots, creating/removing files in slot dirs).
svn path=/trunk/boinc/; revision=18217
Instead, write the info into a file in the slot directory,
and check for these files on startup.
This should reduce the overhead of state-file writing
on machines with lots of cores.
There will still be a flurry of writes each time a job finishes,
but reducing that overhead would be a larger job.
- client: make sure we write the state file after a failed RPC
svn path=/trunk/boinc/; revision=17814
and a 2nd GPU job with an earlier deadline arrives,
neither job is executed ever.
Reorganized things so that scheduling of GPU jobs is
done independently of CPU jobs.
The policy for GPU jobs:
- always EDF
- jobs are always removed from memory, regardless of checkpoint
(GPU memory is not paged, so it's bad to leave an idle app in memory)
svn path=/trunk/boinc/; revision=17402
- client: abort runaway jobs based on elapsed time instead of CPU time.
Specifically, abort jobs for which
elapsed time > WU.rsc_fpops_bound / app_version.flops
This policy works for
1) GPU jobs (which may use little CPU time)
2) jobs that run but because of bugs use little CPU time
(e.g., because they're sleeping)
whereas the old policy didn't.
svn path=/trunk/boinc/; revision=17399
which of those files to include
- Modified MAC address check to work on some non-Linux unixes.
(mac_address.cpp)
- Added suggested change to "already attached to project" checking.
(ProjectInfoPage.cpp)
- changed includes of standard c header files to their c++ equivalents
(i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
- replaced "using namespace std;" with more explicit "using std::function" in
several files.
- Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
to the build environment. (boinc_platform.m4,configure.ac)
- Changed build environment to not use -nostandardlibs unless we are using
G++ and static linkage is specified. (configure.ac)
- Added makefiles and package building files for solaris CSW package manager.
- Fixed bug with attempting to find login name using logname. (configure.ac)
- Added ifdef HAVE_* protection around some include files commonly found in
sys.
- Added support for unified binary for x86_64/i686-pc-solaris.
(cs_platforms.cpp)
- generate_host_cpid() now uses MAC address on non-linux unix.
(hostinfo_network.cpp)
- Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
compilers. (boinc_set_compile_flags.m4)
- Library compiles no longer depend upon the library extension or require
the library to be prefixed with lib.
- More fixes for fcgi builds.
- Added declaration of "struct ether_addr" and ether_ntoa(). Have not yet
implemented ether_ntoa() for machines that don't have it, or where it is
buggy. (unix_util.h)
- Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
- Fixed library Makefiles so that all required headers get installed.
svn path=/trunk/boinc/; revision=17388
1) it uses a coprocessor
2) it has checkpointed since the client started
3) it's being preempted because of a user action
(suspend job, project, or all processing)
or user preference (time of day, computer in use)
- scheduler: if shared mem seg doesn't exist,
report it and don't crash
svn path=/trunk/boinc/; revision=16992
as the basis for estimating job completion times.
This should improve estimates for GPU apps,
and prevent the DCF from getting messed up.
svn path=/trunk/boinc/; revision=16598
<exclusive_app>foo.exe</exclusive_app>
in your cc_config.xml, BOINC will suspend computing
whenever foo.exe is running (e.g., a game).
Eventually we might want to put the interface in preferences
instead of cc_config.xml
svn path=/trunk/boinc/; revision=16087
- client: don't leak process handles when abort jobs
- client: if an app exits or we kill it, always destroy the shmem segment.
- web: more HTML 4.01 Transitional conformity changes
svn path=/trunk/boinc/; revision=15865
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762