Locality scheduling is intended for projects for which
- Each workunit has a large input file
(it may have other smaller input files as well).
- Each large input file is used by many workunits.
The goal of locality scheduling is to minimize
the amount of data transfer to hosts.
In sending work to at given host,
the scheduler tries to send results
that uses input files already on the host.
To use locality scheduling, projects must do the following:
- Workunit names must be of the form FILENAME__*,
where FILENAME is the name of the large input file
used by that workunit.
These filenames cannot contain '__'.
- The <file_info> for each large input file must contain the tags
",html_text("
"),"
- The config.xml file must contain",html_text(""),"
Locality scheduling works as follows:
- Each scheduler RPC contains a list of the
large files already on the host, if any.
- The scheduler attempts to send results that use a file
already on the host.
- For each file that is on the host and for which
no results are available for sending,
the scheduler instructs the host to delete the file.
";
page_tail();
?>