boinc/doc/files.php

222 lines
7.0 KiB
PHP

<?php
require_once("docutil.php");
page_head("Storage");
echo "
<h3>Files and data servers</h3>
<p>
The BOINC storage model is based on <b>files</b>.
Examples of files:
<ul>
<li> The inputs and outputs of applications;
<li> Application executables, libraries, etc.
</ul>
<p>
The BOINC core client transfers files to and from project-operated
<b>data servers</b> using HTTP.
<p>
A file is described by an XML element of the form
".html_text("
<file_info>
<name>foobar</name>
<url>http://a.b.c/foobar</url>
<url>http://x.y.z/foobar</url>
...
<md5_cksum>123123123123</md5_cksum>
<nbytes>134423</nbytes>
<max_nbytes>200000</max_nbytes>
<status>1</status>
[ <generated_locally/> ]
[ <executable/> ]
[ <upload_when_present/> ]
[ <sticky/> ]
[ <signature_required/> ]
[ <no_delete/> ]
[ <report_on_rpc/> ]
</file_info>
")."
The elements are as follows:
";
list_start();
list_item(
"name",
"The file's name, which must be unique within the project.
If you want to use participant hosts on which
filenames are case-insensitive (e.g. Windows)
this uniqueness is case-insensitive."
);
list_item("url",
"a URL where the file is (or will be) located on a data server."
);
list_item("md5_cksum", "The MD5 checksum of the file."
);
list_item("nbytes",
"the size of the file in bytes."
);
list_item("max_nbytes",
"The maximum allowable size of the file in bytes (may be greater than 2^32).
This is used to prevent flooding data servers with bogus data."
);
list_item("status",
"0 if the file is not present,
1 if the file is present, or a negative error code if there was a
problem in downloading or generating the file."
);
list_item("generated_locally",
"If present, indicates that the file will be generated by an application on
the client, as opposed to being downloaded."
);
list_item("executable",
"If present, indicates that the file protections should be set to allow
execution."
);
list_item("upload_when_present",
"If present, indicates that the file should be uploaded
when the application finishes.
The file is uploaded even if the application doesn't
finish successfully.
API functions are available for
<a href=int_upload.php>uploading files prior to
finishing computation</a>.
");
list_item("sticky",
"If present, indicates that the file should be retained
on the client after its initial use."
);
list_item("signature_required",
"If present, indicates that the file should be verified with an
RSA signature.
This generally only applies to executable files."
);
list_item("no_delete",
"If present for an input (workunit) file,
indicates that the file should NOT be removed from the data server's
download directory when the workunit is completed.
Use this if a particular input file or files are used by more than one
workunit, or will be used by future workunits."
);
list_item("no_delete",
"If present for an output (result) file,
indicates that the file should NOT be removed from the data server's upload
directory when the corresponding workunit is completed.
Use with caution - this may cause your upload directory to overflow."
);
list_item("report_on_rpc",
"Include a description of this file in scheduler RPC requests,
so that the scheduler may send appropriate work
using <a href=sched_locality.php>locality scheduling</a>."
);
list_end();
echo "
<p>
Once a file is created (on a data server or a participant host) it
is <b>immutable</b>.
This means that all replicas of that file are assumed to be identical.
<a name=file_ref>
<h3>File references</h3>
<p>
Files may be associated with <a href=work.php>workunits</a>,
<a href=result.php>results</a> and
<a href=app.php>application versions</a>.
Each such association is represented by an XML element of the form
".html_text("
<file_ref>
<file_name>foobar</file_name>
[ <open_name>input</open_name> ]
[ <main_program/> ]
</file_ref>
")."
The elements are as follows:
";
list_start();
list_item("file_name", "Specifies a file.");
list_item("open_name",
"The name by which the application will refer to the file.
Applications access files using
<a href=api.php>the following functions</a>:
<pre>
char physical_name[256];
boinc_resolve_filename(\"input\", physical_name, 256);
fopen(physical_name, \"r\")
</pre>
In this example, open_name is 'input'.
It is mapped, at runtime, to a path that includes
the filename ('foobar' in the example above).
");
list_item("main_program",
"Relevant only for files associated with application versions.
It indicates that this file is the application's main program.
");
list_end();
echo "
<h3>File management</h3>
<p>
BOINC's default behavior is to delete files around
when they aren't needed any more.
Specifically:
<ul>
<li> On the client, input files are deleted when no workunit refers to them,
and output files are deleted when no result refers to them.
Application-version files are deleted when they are referenced
only from superceded application versions.
<li> On the client, the 'sticky' flag overrides the above mechanisms
and suppresses the deletion of the file.
The file may deleted by an explicit
<a href=delete_file.php>server request</a>.
The file may also be deleted at any time by the core client
in order to honor limits on disk-space usage.
<li> On the server, the <a href=file_deleter.php>file deleter daemon</a>
deletes input and output files that are no longer needed.
This can be suppressed using the 'no_delete' flag,
or using command-line options to the file deleter.
</ul>
<a name=compression></a>
<h3>File compression</h3>
<p>
Starting with version 5.4, the BOINC client
is able to handle HTTP Content-Encoding types 'deflate' (zlib algorithm)
and 'gzip' (gzip algorithm).
The client decompresses these files 'on the fly' and
stores them on disk in uncompressed form.
<p>
Projects can set this encoding in two ways:
<ul>
<li>
Use the Apache 2.0 mod_deflate module to automatically
compress files on the fly.
This method will work with all BOINC clients,
but it will do compression only for 5.4+ clients.
<li>
Compress their workunits when they create them and use
a filename suffix such as '.gz'.
In httpd.conf make sure that the following line is present:
<pre>
AddEncoding x-gzip .gz
</pre>
This will add the content encoding to the header so that
the client will decompress the file automatically.
This method has the advantage of reducing server disk usage
and server CPU load,
but it will only work with 5.4+ clients.
Use the 'min_core_version' field of the app_version table to enforce this.
</ul>
You can also use these in conjunction because the mod_deflate module
allows you to exempt certain filetypes from on-the-fly compression.
<p>
Neither of these methods stores files in compressed form on the client.
For this, you must do compression at the application level.
The BOINC source distribution includes
<a href=boinc_zip.txt>a version of the zip library</a>
designed for use by BOINC applications on any platform.
";
page_tail();
?>