Handling long, large-footprint computations

The sequential components of some applications are long (weeks or months) and have a large data "footprint", i.e. their state may be many MB or GB in size. Each component could be represented by a single workunit. However, this has two potential drawbacks.

Alternatively, each component could be subdivided into a number of workunits of bounded average duration (e.g. one day). The avoids the above problems, but it requires the state files to be uploaded frequently, which may impose prohibitive network and storage loads.

To solve these problems, BOINC provides a mechanism called work sequences. Each work sequence represents a long, large-footprint component. It consists of a sequence W1, ... Wn of workunits; each workunit has one or more results. BOINC attempts to execute a work sequence entirely on one host, since this minimizes network traffic. However, it will "relocate" a work sequence (i.e. shift it to another host) if

A work sequence is dynamic; the project back end may extend it depending on the results.

The output files of a result in a sequence are classified as

A result need not have ANY output files; such a result supplies a "heartbeat" telling the server that the host is still working.

Creating work sequences

A project creates a work sequence using the create_work utility. This creates an initial sequence, which may be extended later (see below). The sequence has one result per work unit.

The results in a sequence typically have the following structure:

The decision of whether to use work sequences, and the optimal values of parameters (N, M, and workunit duration) depend on many factors and are left up to the project.

Scheduling work sequences

A work sequence is represented in the BOINC database by its first workunit. At a given time each sequence is either unassigned or is assigned to a particular host. Each host keeps track of the sequences it believes are assigned to it.

This head workunit contains a "maximum result time"; the sequence should only be assigned to hosts that can complete a result in this amount of real time. It also contains a link to the first workunit in the sequence for which no validated result has been received, and to the first workunit that has not been dispatched yet to the assigned host.

Each RPC to a BOINC scheduling server includes a list of work sequences assigned to the host.

If no work sequences are assigned to the host, the scheduler checks for an unassigned work sequence that the host is fast enough to handle. If there is one, it sends it one or more results from this sequence, and assigns the sequence to the host.

Scheduling notes:

  • If the high-water mark allows multiple results to be active, the host can upload result N while processing result N+1.
  • Conceivably there could be cases where a state file upload could be aborted because a newer state file is available. This isn't worth worrying about.
  • This policy limits each host to one sequence; may be want to modify for multiprocessors.

    For each sequence reportedly assigned to the host, the scheduler checks whether the sequence is still assigned to the host, and if not returns an element telling the host it's no longer assigned. Otherwise, it sends the host additional results from the sequence, up to the limit specified by the request. (Note: if the project has a mix of sequence and non-sequence work, this will starve the non-sequence work for this host.)

    Ending or extending a work sequence

    The project's result-handling program, when it has processed a completed result from a work sequence, may generate additional work units and result in the sequence (perhaps a single additional result that uploads the final state files) or it may specify that the sequence has ended. In the latter case, the scheduler will notify the host that the sequence has ended, removing its assignment.

    Relocating work sequences

    A BOINC daemon process periodically checks for work sequences for which some result, dispatched to the currently assigned host, has missed its deadline. It then deassigns the work sequence, and prepares it for reassignment. This involves the following:

    This creates a "dead branch" which remains in the database.

    Redundancy checking for work sequences

    Redundancy checking for work sequences requires some additional logic because the corresponding groups of results belong to different workunits. The proposed mechanism is as follows:

    Note: this scheme could lead to situations in which a slow host holds up the granting of credit to faster hosts. May want to reassign to faster host in this case.

    Examples

    climateprediction.com (known computation length, large state files). Let's say a simulation takes 6 months. Suppose we want a small progress report from the user every 3 days, so we generate 6*30/3 = 60 results per sequence. The scheduling server ensures that each host has at least 2 elements of the sequence at a time, so that it doesn't have to wait for data upload in order to continue. If we want a full state save every 3 weeks, we make every 7th result restartable and set the XML file infos so that the large state files will be uploaded.

    Folding@home (unknown computation length, small state files). We don't know in advance how long a computation will take. The server generates a large group of trajectory sequences, but only creates 2 or 3 results in each sequence. The backend work generator periodically checks how much of each sequence has been completed, and extends any sequences that are nearing completion unless it has been decided to permanently terminate them (i.e. because a more promising trajectory has been found).