mirror of https://github.com/BOINC/boinc.git
Add a basic signaling mechanism to 'scheduling locality' so that the
scheduler notifies the project when it has no results remaining for a given file, and gives the project a brief interval to try and make more WU for that file. Likewise, the project can tell the scheduler if there are no more WU that it can add for a given file. svn path=/trunk/boinc/; revision=5034
This commit is contained in:
parent
b298ed30ab
commit
881c670742
|
@ -22333,3 +22333,63 @@ David 7 Jan 2005
|
|||
team_repair.php (new)
|
||||
user/
|
||||
explain_state.php
|
||||
|
||||
Bruce 8 Jan 2005
|
||||
|
||||
In order to coordinate better with David, I am doing a slightly
|
||||
premature checkin of modifications to scheduling locality. The
|
||||
basic idea is to provide a simple notification mechanism to the
|
||||
project, so that if no work is currently available for a given
|
||||
data file, there is an opportunity to make such work. This is
|
||||
controlled by an additional tag in config.xml, of the form:
|
||||
NOTE: CURRENTLY ENABLED BY DEFAULT WITH N=5 sec
|
||||
|
||||
<locality_scheduling_signal> N </locality_scheduling_signal>
|
||||
|
||||
where N is some number of seconds. If this new tag is absent,
|
||||
then the locality scheduler behaves as before.
|
||||
|
||||
The modification to behavior happens in send_results_for_file().
|
||||
This is the function that queries the database to see if there are
|
||||
unsent results available for a given (large) data file on the
|
||||
host. Previously, if such results are not found the scheduler
|
||||
gives up and tries sending other results. With this modification,
|
||||
if N is nonzero, then if no results are found the scheduler
|
||||
touches a file with the same name as the host's data file, in a
|
||||
directory
|
||||
PROJECT_ROOT/locality_scheduling/need_work/
|
||||
The scheduler then sleeps for N seconds, and makes one additional
|
||||
attempt to find suitable unsent results. The idea is that in this
|
||||
interval, the project has an opportunity to make additional WU for
|
||||
this file, which the transitioner can convert to unsent
|
||||
results. [Note, the transaction for the first query is completed
|
||||
before the sleep(N), and a new transaction is initiated
|
||||
afterwards. So there is no 'sleep within a transaction'.] This
|
||||
delay allows the project to make additional workunits suitable for
|
||||
the host's existing data files.
|
||||
|
||||
In addition, if the project determines that NO further workunits
|
||||
can be made for a given data file, then the project can touch a
|
||||
file with the same name as the input data file, in a directory
|
||||
PROJECT_ROOT/locality_scheduling/no_work_available/
|
||||
If the scheduler finds this warning then it assumes that the
|
||||
project can not manufacture additional WU for this data file and
|
||||
skips the 'notify, sleep, query again' sequence above. Of course
|
||||
it still does the initial query, so if the transitioner has made
|
||||
some new results for an existing (old) WU, they will get picked
|
||||
up.
|
||||
|
||||
This mechanism is robust in the sense that if the signals fail for
|
||||
any reason, or (say) the WU are not converted into unsent results
|
||||
quickly enough by the transitioner, or if they are snapped up by
|
||||
some other host, then the scheduler simply proceeds as with its
|
||||
current unmodified behavior and nothing goes wrong. In other
|
||||
words, the signals can be ignored at any time and for any time
|
||||
without adverse consequences.
|
||||
|
||||
TODO: further testing, enable/disable this feature using XML tag
|
||||
described above.
|
||||
|
||||
sched/
|
||||
sched_locality.C
|
||||
|
||||
|
|
|
@ -27,6 +27,7 @@
|
|||
|
||||
|
||||
#include <stdio.h>
|
||||
#include <unistd.h> // for sleep(2)
|
||||
|
||||
#include "boinc_db.h"
|
||||
|
||||
|
@ -131,12 +132,85 @@ static int send_results_for_file(
|
|||
while (1) {
|
||||
if (!wreq.work_needed(reply)) break;
|
||||
boinc_db.start_transaction();
|
||||
// Look for results which match file 'filename'
|
||||
|
||||
// Comment 1: in order to work as designed, this query should
|
||||
// do 'order by id'. But one has to check that this won't
|
||||
// kill DB efficiency.
|
||||
|
||||
// Comment 2: if the user has configured one_result_per_user_per_wu then you can
|
||||
// replace ID below by workunitid.
|
||||
sprintf(buf,
|
||||
"where name like '%s__%%' and server_state=%d and id>%d limit 1",
|
||||
filename, RESULT_SERVER_STATE_UNSENT, lastid
|
||||
);
|
||||
retval = result.lookup(buf);
|
||||
if (retval) {
|
||||
// We did not find any matching results. In this case,
|
||||
// check with the WU generator to see if we can make some
|
||||
// more WU for this file.
|
||||
char fullpath[512];
|
||||
sprintf(fullpath, "../locality_scheduling/no_work_available/%s", filename);
|
||||
FILE *fp=fopen(fullpath, "r");
|
||||
if (fp) {
|
||||
// since we found this file, it means that no work
|
||||
// remains for this WU. So give up trying to interact
|
||||
// with the WU generator.
|
||||
fclose(fp);
|
||||
log_messages.printf(
|
||||
SCHED_MSG_LOG::DEBUG,
|
||||
"found %s indicating no work remaining for file %s\n", fullpath, filename
|
||||
);
|
||||
}
|
||||
else {
|
||||
// We'll open and touch a file in the need_work/
|
||||
// directory as a way of indicating that we need work
|
||||
// for this file. If this operation fails, don't
|
||||
// worry or tarry!
|
||||
sprintf(fullpath, "../locality_scheduling/need_work/%s", filename);
|
||||
FILE *fp2=fopen(fullpath, "w");
|
||||
if (fp2) {
|
||||
fclose(fp2);
|
||||
log_messages.printf(
|
||||
SCHED_MSG_LOG::DEBUG,
|
||||
"touching %s: need work for file %s\n", fullpath, filename
|
||||
);
|
||||
// Finish the transaction, wait for the WU
|
||||
// generator to make a new WU, and try again!
|
||||
boinc_db.commit_transaction();
|
||||
sleep(5);
|
||||
// Now look AGAIN for results which match file
|
||||
// 'filename'. Note: result.clear() may not be
|
||||
// needed since previous query didn't find any
|
||||
// results.
|
||||
result.clear();
|
||||
sprintf(buf,
|
||||
"where name like '%s__%%' and server_state=%d and id>%d limit 1",
|
||||
filename, RESULT_SERVER_STATE_UNSENT, lastid
|
||||
);
|
||||
boinc_db.start_transaction();
|
||||
retval = result.lookup(buf);
|
||||
if (!retval) {
|
||||
log_messages.printf(
|
||||
SCHED_MSG_LOG::DEBUG,
|
||||
"success making/finding NEW work for file %s\n", fullpath, filename
|
||||
);
|
||||
}
|
||||
}
|
||||
else {
|
||||
log_messages.printf(
|
||||
SCHED_MSG_LOG::CRITICAL,
|
||||
"unable to touch %s to indicate need work for file %s\n", fullpath, filename
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (!retval) {
|
||||
// We found a matching result. Probably we will get one
|
||||
// of these, although for example if we already have a
|
||||
// result for the same workunit and the administrator has
|
||||
// set one_result_per_wu then we won't get one of these.
|
||||
lastid = result.id;
|
||||
if (possibly_send_result(
|
||||
result,
|
||||
|
@ -145,6 +219,7 @@ static int send_results_for_file(
|
|||
nsent++;
|
||||
}
|
||||
}
|
||||
|
||||
boinc_db.commit_transaction();
|
||||
if (retval) break;
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue