Add a basic signaling mechanism to 'scheduling locality' so that the

scheduler notifies the project when it has no results remaining for a given
file, and gives the project a brief interval to try and make more WU for
that file.  Likewise, the project can tell the scheduler if there are
no more WU that it can add for a given file.

svn path=/trunk/boinc/; revision=5034
This commit is contained in:
Bruce Allen 2005-01-08 20:55:49 +00:00
parent b298ed30ab
commit 881c670742
2 changed files with 135 additions and 0 deletions

View File

@ -22333,3 +22333,63 @@ David 7 Jan 2005
team_repair.php (new)
user/
explain_state.php
Bruce 8 Jan 2005
In order to coordinate better with David, I am doing a slightly
premature checkin of modifications to scheduling locality. The
basic idea is to provide a simple notification mechanism to the
project, so that if no work is currently available for a given
data file, there is an opportunity to make such work. This is
controlled by an additional tag in config.xml, of the form:
NOTE: CURRENTLY ENABLED BY DEFAULT WITH N=5 sec
<locality_scheduling_signal> N </locality_scheduling_signal>
where N is some number of seconds. If this new tag is absent,
then the locality scheduler behaves as before.
The modification to behavior happens in send_results_for_file().
This is the function that queries the database to see if there are
unsent results available for a given (large) data file on the
host. Previously, if such results are not found the scheduler
gives up and tries sending other results. With this modification,
if N is nonzero, then if no results are found the scheduler
touches a file with the same name as the host's data file, in a
directory
PROJECT_ROOT/locality_scheduling/need_work/
The scheduler then sleeps for N seconds, and makes one additional
attempt to find suitable unsent results. The idea is that in this
interval, the project has an opportunity to make additional WU for
this file, which the transitioner can convert to unsent
results. [Note, the transaction for the first query is completed
before the sleep(N), and a new transaction is initiated
afterwards. So there is no 'sleep within a transaction'.] This
delay allows the project to make additional workunits suitable for
the host's existing data files.
In addition, if the project determines that NO further workunits
can be made for a given data file, then the project can touch a
file with the same name as the input data file, in a directory
PROJECT_ROOT/locality_scheduling/no_work_available/
If the scheduler finds this warning then it assumes that the
project can not manufacture additional WU for this data file and
skips the 'notify, sleep, query again' sequence above. Of course
it still does the initial query, so if the transitioner has made
some new results for an existing (old) WU, they will get picked
up.
This mechanism is robust in the sense that if the signals fail for
any reason, or (say) the WU are not converted into unsent results
quickly enough by the transitioner, or if they are snapped up by
some other host, then the scheduler simply proceeds as with its
current unmodified behavior and nothing goes wrong. In other
words, the signals can be ignored at any time and for any time
without adverse consequences.
TODO: further testing, enable/disable this feature using XML tag
described above.
sched/
sched_locality.C

View File

@ -27,6 +27,7 @@
#include <stdio.h>
#include <unistd.h> // for sleep(2)
#include "boinc_db.h"
@ -131,12 +132,85 @@ static int send_results_for_file(
while (1) {
if (!wreq.work_needed(reply)) break;
boinc_db.start_transaction();
// Look for results which match file 'filename'
// Comment 1: in order to work as designed, this query should
// do 'order by id'. But one has to check that this won't
// kill DB efficiency.
// Comment 2: if the user has configured one_result_per_user_per_wu then you can
// replace ID below by workunitid.
sprintf(buf,
"where name like '%s__%%' and server_state=%d and id>%d limit 1",
filename, RESULT_SERVER_STATE_UNSENT, lastid
);
retval = result.lookup(buf);
if (retval) {
// We did not find any matching results. In this case,
// check with the WU generator to see if we can make some
// more WU for this file.
char fullpath[512];
sprintf(fullpath, "../locality_scheduling/no_work_available/%s", filename);
FILE *fp=fopen(fullpath, "r");
if (fp) {
// since we found this file, it means that no work
// remains for this WU. So give up trying to interact
// with the WU generator.
fclose(fp);
log_messages.printf(
SCHED_MSG_LOG::DEBUG,
"found %s indicating no work remaining for file %s\n", fullpath, filename
);
}
else {
// We'll open and touch a file in the need_work/
// directory as a way of indicating that we need work
// for this file. If this operation fails, don't
// worry or tarry!
sprintf(fullpath, "../locality_scheduling/need_work/%s", filename);
FILE *fp2=fopen(fullpath, "w");
if (fp2) {
fclose(fp2);
log_messages.printf(
SCHED_MSG_LOG::DEBUG,
"touching %s: need work for file %s\n", fullpath, filename
);
// Finish the transaction, wait for the WU
// generator to make a new WU, and try again!
boinc_db.commit_transaction();
sleep(5);
// Now look AGAIN for results which match file
// 'filename'. Note: result.clear() may not be
// needed since previous query didn't find any
// results.
result.clear();
sprintf(buf,
"where name like '%s__%%' and server_state=%d and id>%d limit 1",
filename, RESULT_SERVER_STATE_UNSENT, lastid
);
boinc_db.start_transaction();
retval = result.lookup(buf);
if (!retval) {
log_messages.printf(
SCHED_MSG_LOG::DEBUG,
"success making/finding NEW work for file %s\n", fullpath, filename
);
}
}
else {
log_messages.printf(
SCHED_MSG_LOG::CRITICAL,
"unable to touch %s to indicate need work for file %s\n", fullpath, filename
);
}
}
}
if (!retval) {
// We found a matching result. Probably we will get one
// of these, although for example if we already have a
// result for the same workunit and the administrator has
// set one_result_per_wu then we won't get one of these.
lastid = result.id;
if (possibly_send_result(
result,
@ -145,6 +219,7 @@ static int send_results_for_file(
nsent++;
}
}
boinc_db.commit_transaction();
if (retval) break;
}