file an output file for a SUCCESSFUL result. Failure to
find an output file for an UNSUCCESSFUL result is OK, and
just logged at level DEBUG.
svn path=/trunk/boinc/; revision=7184
[1] WU error flag set
[2] WU already has canonical result
[3] (report_deadline - current_time) < 25% of WU delay bound
If any of these conditions is true, set the report deadline to the
current time and set the WU transition time to the current time.
The transitioner will then 'do the right thing'.
svn path=/trunk/boinc/; revision=6871
(1) Put core client version number into wreq BEFORE searching for
an app version. Problem is that reply.wreq.core_client_version was only being set in
send_work(), which was too late for the resend_lost_work() part
of the code. You might want to move all the initialization of reply.wreq
out of send_work(). The core client version is needed to see if the
app is compatible with it when calling get_app_version().
(2) In retransmitting lost work, do NOT set the deadline to new
values. Else the result will never time out! But DO reset
the sent_time, to indicate that result was resent.
transitioner:
In the transitioner, make the next WU transition time be the min
of deadlines of the in progress results, NOT the min of the
sent_time+delay bound. Unless a project wants to do dynamic
adjustment of delay bounds for in progress results this should be OK.
CPDN people: I don't think this does any harm for trickles but
you might want to give it a quick look to be 100% sure.
svn path=/trunk/boinc/; revision=6870
values. Else the result will never time out! (David, would it be OK to
simply modify the send_time but NOT the deadline? This would make it easy
to see in the database that the result is being resent.)
svn path=/trunk/boinc/; revision=6868
scheduler.
Problem is that reply.wreq.core_client_version was only being set in
send_work(), which was too late for the resend_lost_work() part
of the code. You might want to move all the initialization of reply.wreq
out of send_work(). The core client version is needed to see if the
app is compatible with it when calling get_app_version().
svn path=/trunk/boinc/; revision=6867
than 24 hours away, to prevent thrashing. But this delayed
reissuing of new results. For example if two results were
issued a hours 17 and 18, and both timed out (no reply). At
time 17+deadline the first would time out and a new result
would be issued. But then instead of setting the transition
time to 18+deadline it would be set to 18+deadline+1 day.
To prevent thrashing I have fixed this so that if a transition
time is in the past, I advance it by TWICE the amount it is late,
but never less than 1 minute or more than 1 day.
- Ops pages: show unsent/in-progress results in purple. For
unsent results show create time rather than deadline.
svn path=/trunk/boinc/; revision=6637
lower case and upper case
locality scheduler: when searching for new work using advertised
files, retry ten times before starting a deterministic search.
We should probably modify this to try ALL advertised files in
a random order before moving onto a deterministic search.
svn path=/trunk/boinc/; revision=6482
lib advertise data file when new result is created.
code organization: create new lib function boinc_touch_file()
from code that was in locality scheduler module.
svn path=/trunk/boinc/; revision=6456
running for a given host. This works by creating a file called
CGI_<HOSTID> in the cgi-bin/ directory, and using Posix advisory file
locking. I have been testing this code for three days and am seeing
*some* invocations of this. David, I'll send details to the dev mailing
list or talk with you about it later this week. Note: this code probably
can be removed in the future, when the core client problems are fixed.
Also note: I don't know if this is compatible with the fast cgi sched.
svn path=/trunk/boinc/; revision=6172
it to delay until a random time falling within the first hour of the following
day. Previously the host would be told to delay one hour, which could lead to
as many as 24 retries in a one-day period.
svn path=/trunk/boinc/; revision=6132
assume that they are inadequate for ALL WU. Without this we will
execute an expensive deterministic search over all WU, looking
for one that is appropriate. This could be a config option if
desired, or one might add in an extra search step to find WU with
appropriate resources. But for now this is the cleanest.
svn path=/trunk/boinc/; revision=5999
reliable to use flock/lockf/fnctl file locking with buffered
IO. This is because the stream libraries might unexpectedly
open/close/dup file descriptors on you. So I have modified
the file write/append functions to use raw IO rather than
buffered IO. In doing this I also found and fixed some small
bugs. There is no guarantee that one can mix flock/lockf/fnctl
file locking so I have settled on fnctl since it is POSIX and
gives the most control.
svn path=/trunk/boinc/; revision=5891
append mode then fseek() then write to the file, the fseek() HAS
NO EFFECT. This is documented ANSI C. So I have eliminated the
fseek. We now check that the file size corresponds exactly to the
claimed offset of the data. If they do not agree then return a
transient error to force the host to ask again for the file length
and re-transmit data.
svn path=/trunk/boinc/; revision=5866
file length, check first that the file is not already
in open (locked) by another file_upload_handler. If the
file IS open (locked), then do NOT hand back the file length.
Instead return a transient error. This will prevent
transmission of upload data starting at the wrong offset.
- To help understand when/why multiple file_upload_handlers
are trying to write to the same file, set default log level
to DEBUG. Also log messages at level CRITICAL if there is
an attempt to write to a locked file. We may want to change
this level to DEBUG in the future, if this turns out to be
'normal' TCP buffering of data between host and server.
svn path=/trunk/boinc/; revision=5851
from trying to upload the SAME file, use lockf() to place an advisory lock on
the file. David, I probably should have discussed this with you first, but it's
too early in the morning. Please revert if this is a mistake!
svn path=/trunk/boinc/; revision=5837
of logging. This is intended to help debug file uploading
problems, where apache kills the file upload handler because
something is going wrong.
svn path=/trunk/boinc/; revision=5824
E@H specific is now included (but protected by
#ifdef EINSTEIN_AT_HOME
to make it simpler for me to maintain consistency with BOINC cvs.
- Added project-specific unacceptable_os() function for rejecting hosts.
- Transitioner and scheduler now initalize host.max_results_day correctly
in database under all circumstances.
- Browser requests are now correctly identifed (REQUEST_METHO=="GET") and
properly redirected. This was broken. David, please see comment near
one of the probable_user_browser=true in handle_request.C. I think
something is wrong here (or I am missing the point!).
- More info about requests is logged
- If the scheduler hangs (incoming request incomplete) it will normally be
killed by Apache after a timeout. But this happens silently. So I now
install a signal handler and catch this SIGTERM. In this case an
error message is logged and all open files are flushed before exit(1)ing.
- If IO is passed through files, check that request length and content length
agree and log a message if they do NOT.
- active_frac not correctly reported by 4.19 and earlier core clients.
Adjust for this in estimating wallclock execution times.
- Added a small block into validator code to attach a debugger.
svn path=/trunk/boinc/; revision=5688
core clients. Fix strips newlines from messages sent to clients
<= 4.19. NOTE: stripping may ALSO be needed for more recent
clients. But it would be better to fix the clients so that
embedded newlines in messages are respected.
svn path=/trunk/boinc/; revision=5543
- For locality scheduler, if anonymous platform lacks app, don't do deterministic
search for work!
- For locality scheduler, remove 'unsent' constraint from initial query so that
existing index in result table can be used to perform a more efficient search.
- Send multi-message replies to core clients > 4.19
- Change 'no work available' message to 'no work sent' since this is often due
to constraints at the client end, NOT lack of work at project end.
svn path=/trunk/boinc/; revision=5492
the daily_result_quota constraint was not being enforced.
Normally this constraint is enforced in the work_needed()
function. However note that the critical send_work() function
NEVER checks work_needed() [DAVID, perhaps it should?] before
calling send_work_locality() or scan_work_array(). Then, when
send_work_locality() was called, it would in turn call
send_old_work() immediately, WITHOUT checking to see if
work_needed() was TRUE. This allowed the daily_result_quota
constraint to be broken.
Possible fixes included:
test work_needed() before calling send_old_work()
test work_needed() WITHIN send_old_work()
test work_needed() within possibly_send_result()
test work_needed() within wu_is_infeasible()
Conclusion: work is ONLY sent by the function
possibly_send_result() which is called in two places in
sched_locality.C: once in send_results_for_file() and once in
send_old_work(). The first of these DOES check the value of
work_needed(). The second does NOT. So I added a check of
work_needed() within send_old_work(). A also added
added another check of work_needed() at the top of
send_results_for_file() BEFORE any DB access is done. It might be
better to put this test of work_needed() lower down (within
possibly_send_result()) or higher up (where send_old_work())
is called. I am not sure. David, I'd appreciate your advice.
svn path=/trunk/boinc/; revision=5482
4 CPUS.
Improved error messages if users are being denied work because of
lack of CPU. The message reports back their on fraction, active
fraction, and resouce share fraction, as percentages.
svn path=/trunk/boinc/; revision=5466
scheduler, and fails to get work because there are N secs
of pending work, then send a delay request of min(3600, N/5) secs.
Otherwise the same host was coming back every hour, without being able
to get additional work.
Implemented by adding a method set_delay() to
SCHEDULER_REQUEST. This sets the delay to the maximum of the
previous requested delay or the current requested delay. The
delay is NEVER set longer than two days.
svn path=/trunk/boinc/; revision=5437
- Ignore CPU limitations and resource share entirely, IF
a host:
(1) has no work for this project
(2) has no results in this sched reply
This ensures that any host that wants to do work will at least
get *something*. It liberalizes slightly David A's approach
from 14 Feb 2005. Eliminate use_time_stats from wreq structure.
- Scheduler changes (locality scheduling only):
- Improve return value info for some functions.
- Modify send_old_work() to accept a t_min < t < t_max time range
- New sched locality algorithm to send work to hosts with no files.
Send oldest result in the time range A < t < B where
B = locality_scheduling_timeout/2
A = B - rand*locality_scheduling_timeout/2
Here rand is a uniformly distributed random number in [0,1].
- When an unsent result is older than locality_scheduling_timeout, no
longer send it to the FIRST host that requests work. Instead send
it to the first host which has a connection speed > 100kb/s.
- Fix file deletion. Previously we were deleting files from hosts
when they got no work for that file. But this might have been
because the work was infeasible (cpu time). Now delete files
from host ONLY if there is no work remaining for that file.
svn path=/trunk/boinc/; revision=5434
a work request, do not search for or send further work. This is the same
way that disk space limits are handled. This is necessary since otherwise
a host with small memory will endlessly trigger the WU generator, churning
out infeasible WUs.
Added boolean arg to host_has_file() following David A's advice. This
eliminates the 'expensive' copy of a large data structure. The bool arg
makes host_has_file() skip the final WU in the vector in hunting for a file.
Better log message for setting coredump size.
Added RCSID tag to sched_timezone.C
svn path=/trunk/boinc/; revision=5397
is disabled by default. Having this
is really useful if the scheduler is crashing some of the time. You
can load the core dump file into a debugger to see where things are
breaking. To use this, edit sched/main.C by hand and set
#define DUMP_CORE_ON_SEGV 1
svn path=/trunk/boinc/; revision=5385
- Address David's comment of Feb 2. Now properly reduce the
disk size resource requirements of a WU being sent if the
file is already on the host, or already included in a previous
WU being sent. DAVID: please check that reply_copy.wus.pop_back()
is right.
- For this, define a function host_has_file(). This can also
be used in the future for more intelligent file deletion
schemes.
- Make warnings to upgrade old clients have low priority until
3 days before deadline. Then high priority.
- Fix sign error in messages sent to users about insufficient
disk space.
- Move extract_filename() from sched_locality.C to sched_util.C
- Pretty up the ordered list of URLs printed for a given host.
- I've even tested these changes before committing them!
svn path=/trunk/boinc/; revision=5382
<min_core_client_version_announced> N </min_core_client_version_announced>
<min_core_client_upgrade_deadline> M </min_core_client_upgrade_deadline>
This is used to warn users in advance if a new minimum core client is going
to be required. Users have until time 'M' (Unix epoch time(2) format)
to upgrade. Not yet tested.
svn path=/trunk/boinc/; revision=5370
computing timezone differences, not taking into account the fact that
UTC+11 hours and UTC-11 hours are only 2 hours apart. Duh.
svn path=/trunk/boinc/; revision=5357
config.xml. Use the boolean tag <cache_md5_info> to enable it.
This prevents the work generation library from having to go back and
continuously regenerate the md5 sums of your input data files. Note
that reading these from disk can be expensive if you have many such files
that are large and that you re-use. See check-in notes from 30/31 Dec 2004
for some details.
svn path=/trunk/boinc/; revision=5281
no work was sent to hosts, and available space<0 OR if available space>0
but work was unfeasible because the disk bound requirements of the work
exceeded the available space.
Added a new config.xml boolean element called 'choose_download_url_by_timezone'
This requires that projects provide a 2-column file in the project root named
'download_servers'. An example is:
3600 http://einstein.aei.mpg.de
-21600 http://einstein.phys.uwm.edu
The first column is offset from UTC in seconds, and the second column is the URL
of the download server. When enabled, the scheduler will replace the download
path for data and executables by a list of download URLs, ordered by proximity
to the host's timezone. The download path must start with the
BOINC default download/ and the different download servers must have identical
file paths under download/, in other words they must be mirrored.
svn path=/trunk/boinc/; revision=5275
< 4.62 to be interchanged with their priority. So if the message was
supposed to be 'No work available' with priority 'low', the actual
svn path=/trunk/boinc/; revision=5273
disk space is < 0, delete files which have <sticky> and <report_on_rpc>
set. Note that (1) this deletion simply removes the <sticky> tag, so
file won't be deleted until after all WU that depend upon it are
completed and (2) the mechanism to determine which file to delete
could be improved. TODO: improve messages to hosts which have no file
space and ALSO have no files to delete.
- scheduler changes: locality scheduling. Clean up code which makes a
deterministic search of results to delete. Data files names can not
contain the "~" character!
- scheduler changes: added a simple debugging mechanism for scheduler
replies. If you touch a file named 'debug_sched' in the project
root, then files called sched_reply_HOSTID_RPCNO will be created
under cgi-bin/ which contain the scheduler replies. You can turn on
this mechanism for some time to study the scheduler replies.
In a little while I will add a similar debugging feature which also
prints the corresponding scheduler requests.
svn path=/trunk/boinc/; revision=5247
(a) make DB queries more efficient using name>'FILE__' and name<'FILE__~' rather than
name like 'FILE__%'
(b) Set 'no remaining work for this file' flag correctly by making a DB scan if needed.
One can show that this is the 'cheapest' reliable place to put this scan.
(c) Modify deterministic algorithm for finding unsent results so that instead of
starting with FILE="" and scanning forward over all files, it starts at a random
place in file space, scans cyclicly to the end, and then from "" to the start
point.
(d) Satisfy work request if possible. Don't terminate sending work until none left that
is feasible, or request satisfed.
(e) If a new file is needed, first pick file associated with unsent results which are more
than 2 hours old. Note: need to make this a user-configurable option, and add some
random +- slack.
For the record, here is the current locality scheduler logic.
I will update the docs once this is a bit better tested and
stable.
(1) If there is an (one) unsent result which is older than
(1) config.locality_scheduling_send_timeout (7 days) and is
(1) feasible for the host, sent it.
(2) If we did send a result in the previous step, then send any
(2) additional results that are feasible for the same input file.
(3) If additional results are needed, step through input files on
(3) the host. For each, if there are results that are feasible for
(3) the host, send them. If there are no results that are feasible
(3) for the host, delete the input file from the host.
(4) If additional results are needed, and there is (one) unsent
(4) result which is older than 2 hours and is feasible for the
(4) host, send it.
(5) If we did send a result in the previous step, then send any
(5) additional results that are feasible for the same input file.
(6) If additional results are needed, select an input file name at
(6) random from the current input file working set advertised by
(6) the WU generator. If there are results for this input file
(6) that are feasible for this host, send them.
(7) If additional results are needed, carry out an expensive,
(7) deterministic search for ANY results that are feasible for the
(7) host. This search starts from a random filename advertised by
(7) the WU generator, but continues cyclicly to cover ALL results
(7) for ALL files. If a feasible result is found, send it. Then
(7) send any additional results that use the same input file. If
(7) there are no feasible results for the host, we are finished:
(7) exit.
(8) If addtional results are needed, return to step 4 above.
svn path=/trunk/boinc/; revision=5129
actually not impossible. Consider the following scenario: WU A
has result 1 and WU B has result 2. These are both sent to a
host. Some time later, result 1 fails and the transitioner
creates a new result, result 3 for WU A. Then the host requests
a new result. The maximum result already sent to the host is 2.
The next unsent result (sorted by ID) is #3. But since it is
for WU A, and since the host has already gotten a result for WU
A, it's infeasible. So I think this is only wacky if
!one_wu_per_result_per_host.
- David, I simplified the inner part of send_results_for_file()
somewhat. I can't see the need/use for the bool bool
in_working_set argument. If I have really screwed the pooch
please revert.
svn path=/trunk/boinc/; revision=5106
Trivial bug, FPE on n % 0 when host has no files.
Hard bug, in the deterministic search to find a new result that can
be sent, the upwards search on name must be done not by comparing
RESULT name to FILENAME, but instead by comparing result name to the
maximal lexical resultname that can be constructed from the
filename, which is filename_ZZZ...Z where Z==0xff.
svn path=/trunk/boinc/; revision=5085
flag a file as over unless the WU generator has already indicated that
no further work can be remaining. Search code for 'David' to find some
comments.
svn path=/trunk/boinc/; revision=5077
log information from different scheduler requests running
in parallel don't collide in the log file and appear
intermingled. Very useful when doing verbose debugging.
svn path=/trunk/boinc/; revision=5069