2002-04-30 22:22:54 +00:00
|
|
|
Abstractions
|
|
|
|
|
|
|
|
"Project": each is described by a URL.
|
|
|
|
Each has its own database and control server.
|
|
|
|
|
|
|
|
"Application": a particular program.
|
|
|
|
A project may have several applications.
|
|
|
|
|
|
|
|
"Account": each user has a separate account with each project.
|
|
|
|
Each account has a unique email address and a server-assigned authenticator.
|
|
|
|
|
|
|
|
--------------------
|
|
|
|
Client files
|
|
|
|
two files:
|
|
|
|
|
|
|
|
account.xml
|
|
|
|
list of projects; for each:
|
|
|
|
user ID
|
|
|
|
password
|
|
|
|
preferences
|
|
|
|
|
|
|
|
client_state.xml
|
|
|
|
hostid
|
|
|
|
rpc_seqno (per project)
|
|
|
|
files, WUs, results etc.
|
|
|
|
|
|
|
|
NOTE: to "clone" an installation on a new computer,
|
|
|
|
just need to copy the core client (or run the installer)
|
|
|
|
then copy the account.xml file.
|
|
|
|
|
|
|
|
NOTE: a scheduler request can specify that no client_state.xml
|
|
|
|
was found, so a new host record should be created.
|
|
|
|
If a scheduler gets a request with an unexpected seqno,
|
|
|
|
it sends back a reply saying
|
|
|
|
--------------------
|
|
|
|
When does client contact scheduling server?
|
|
|
|
Each result has a max notification delay,
|
|
|
|
so when a client completes it there's a deadline for notification.
|
|
|
|
|
|
|
|
Contact a scheduling server if:
|
|
|
|
- you're below the low-water mark in work for that project,
|
|
|
|
or you have a result past its deadline
|
|
|
|
- AND there's no delay in effect for that project.
|
|
|
|
A delay may be explicitly returned by the scheduling server,
|
|
|
|
or may be because of exponential backoff after failed attempts.
|
|
|
|
--------------------
|
|
|
|
Given that we can estimate the time it will take to get back
|
|
|
|
a result from a given host, it might be possible to assign
|
|
|
|
deadlines to results, and only send them to hosts that are fast enough
|
|
|
|
--------------------
|
|
|
|
Client logging
|
|
|
|
write events to log file:
|
|
|
|
start/stop client
|
|
|
|
start/finish file xfer
|
|
|
|
start/finish application execution
|
|
|
|
start/finish scheduling server call
|
|
|
|
error messages
|
|
|
|
|
|
|
|
logging flag is part of preferences
|
|
|
|
--------------------
|
|
|
|
division between database and XML
|
|
|
|
proposal: move as much info as possible out of the DB into XML files.
|
|
|
|
examples:
|
|
|
|
- workunits and results
|
|
|
|
WUs and results are described by XML files listing
|
|
|
|
their inputs, outputs, etc.
|
|
|
|
The DB entry for a WU contains only info relevant to scheduling:
|
|
|
|
memory/disk/communication requirement
|
|
|
|
- user info
|
|
|
|
A configuration is an XML file, opaque to the scheduling server
|
|
|
|
|
|
|
|
--------------------
|
|
|
|
WUs and results
|
|
|
|
- WUs and results are desribed by XML files that describe their
|
|
|
|
input and output files.
|
|
|
|
- each client computation is represented by a "result" DB record,
|
|
|
|
which is created BEFORE the client requests it.
|
|
|
|
The application server system must keep the DB supplied
|
|
|
|
with result records, or clients will starve.
|
|
|
|
|
|
|
|
NOTE: this is necessary to control where output files go.
|
|
|
|
Could also have a scheme where each application has a
|
|
|
|
"template" result file.
|
|
|
|
This instructs the client to create its own output file names.
|
|
|
|
When the client returns the result,
|
|
|
|
the server creates the result record and plugs in file names.
|
|
|
|
--------------------
|
|
|
|
File info
|
|
|
|
input files
|
|
|
|
"sticky": don't delete after result done
|
|
|
|
URL (if not already on client)
|
|
|
|
output files
|
|
|
|
"sticky": don't delete after result done
|
|
|
|
URL (optional; send here after result done)
|
|
|
|
--------------------
|
|
|
|
file xfer commands
|
|
|
|
implemented as WU/result pairs whose app is "file_xfer".
|
|
|
|
Can have just one input file, one output.
|
|
|
|
Application servers can leaves these in a "message" directory,
|
|
|
|
where the scheduling server can find them and give to
|
|
|
|
client next time they contact.
|
|
|
|
--------------------
|
|
|
|
result states in client
|
|
|
|
don't have files yet
|
|
|
|
have files, not started
|
|
|
|
have files, started
|
|
|
|
completed, sending output files
|
|
|
|
output files sent
|
|
|
|
output files sent, some sticky files deleted
|
|
|
|
|
|
|
|
--------------------
|
|
|
|
result attributes in DB, sched server
|
|
|
|
state:
|
|
|
|
unsent
|
|
|
|
sent, in progress
|
|
|
|
timed out
|
|
|
|
file state
|
|
|
|
all output files are openly available
|
|
|
|
(i.e. have been uploaded)
|
|
|
|
|
|
|
|
WU attributes in DB, sched server
|
|
|
|
input file state (set by app server)
|
|
|
|
all input files are available
|
|
|
|
not all input files available
|
|
|
|
|
|
|
|
--------------------
|
|
|
|
Workunit affinity
|
|
|
|
|
|
|
|
This mechanism allows a sequence of WUs to get executed on the same host,
|
|
|
|
but allows the sequence to migrate (or be duplicated) if needed.
|
|
|
|
|
|
|
|
result attributes:
|
|
|
|
previous_resultid
|
|
|
|
This result is a "successor" to the previous one.
|
|
|
|
If all the sticky input and output files of the previous WU are present,
|
|
|
|
this WU can be executed efficiently.
|
|
|
|
has_successor
|
|
|
|
This result has a successor.
|
|
|
|
|
|
|
|
How it works:
|
|
|
|
The project generates a sequence of WUs,
|
|
|
|
each with one or more results.
|
|
|
|
It chains the results together into sequences.
|
|
|
|
|
|
|
|
When a client completes a result with successor,
|
|
|
|
it retains the result record.
|
|
|
|
|
|
|
|
NOTE: one goal of this design is to avoid the scheduler
|
|
|
|
having to know about individual files
|
|
|
|
--------------------
|
|
|
|
Scheduler request
|
|
|
|
|
|
|
|
The client sends all its results with successors
|
|
|
|
|
|
|
|
scheduler algorithm:
|
|
|
|
if there any results with predecessors
|
2002-05-29 23:25:21 +00:00
|
|
|
for which the client has all sticky files,
|
|
|
|
send them in preference to any other results
|
2002-04-30 22:22:54 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
--------------------
|
|
|
|
database tables
|
|
|
|
|
|
|
|
application
|
|
|
|
platform
|
|
|
|
app_version
|
|
|
|
core_version
|
|
|
|
account
|
|
|
|
file
|
|
|
|
workunit
|
|
|
|
applicationid
|
|
|
|
file1 name1
|
|
|
|
file2 name2
|
|
|
|
nresults
|
|
|
|
result
|
|
|
|
workunitid
|
|
|
|
accountid
|
|
|
|
fileid
|
|
|
|
boolean verified
|
|
|
|
host
|
|
|
|
--------------------
|
|
|
|
State maintained on client
|
|
|
|
Config file (XML)
|
|
|
|
|
|
|
|
<config>
|
|
|
|
<update-time>123123</update-time>
|
|
|
|
// last time user added project or changed CPU shares
|
|
|
|
<max-disk-mb>1000</max-disk-mb>
|
|
|
|
<min-work-hrs>10</min-work-hrs>
|
|
|
|
// if estimated work falls below this, try to get more
|
|
|
|
<max-work-hrs>10</max-work-hrs>
|
|
|
|
// don't get more work if estimate is above this
|
|
|
|
<max_ram_while_user_active>20</max_ram_while_user_active>
|
|
|
|
// zero means don't work while user active
|
|
|
|
<projects>
|
|
|
|
<project>
|
|
|
|
<url>http://wjwjwj</url>
|
|
|
|
<dont-contact-until>123123</dont_contact_until>
|
|
|
|
<control-server>blah.blah</control-server>
|
|
|
|
</control-server>blah.blah</control-server>
|
|
|
|
<cpu-share>1</cpu-share>
|
|
|
|
<max-disk-mb>100</max-disk-mb>
|
|
|
|
<cpu-total>5.44</cpu-total>
|
|
|
|
// this is zeroed out each time shares updated
|
|
|
|
<password>123123123</password>
|
|
|
|
// stored on client only; not sent to server in general
|
|
|
|
<email-address>foo@bar</email-address>
|
|
|
|
</home-project>
|
|
|
|
|
|
|
|
<file>
|
|
|
|
<md5>sfkjf</md5>
|
|
|
|
<url>akdjsfd</url>
|
|
|
|
<size>123123</size>
|
|
|
|
<complete/>
|
|
|
|
</file>
|
|
|
|
<workunit>
|
|
|
|
<name>skdjf</name>
|
|
|
|
<file>
|
|
|
|
<name>foo</name>
|
|
|
|
<appname>blah</appname>
|
|
|
|
// name by which app refers to file
|
|
|
|
</file>
|
|
|
|
</workunit>
|
|
|
|
<result>
|
|
|
|
<workunit-name>12938</workunit-name>
|
|
|
|
<result>
|
|
|
|
</project>
|
|
|
|
...
|
|
|
|
</projects>
|
|
|
|
</config>
|
|
|
|
|
|
|
|
--------------------
|
|
|
|
Security notes:
|
|
|
|
--------------------
|
|
|
|
Client directory structure
|
|
|
|
top-level dir
|
|
|
|
project dir (one per project)
|
|
|
|
CPU dir (one per CPU)
|
|
|
|
contains symbolic links to application file,
|
|
|
|
all input and output files
|
|
|
|
--------------------
|
|
|
|
Client logic
|
|
|
|
["network xfer" object encapsulates a set of file xfers in progress]
|
|
|
|
["processor" object: one for each CPU]
|
|
|
|
|
|
|
|
read config file
|
|
|
|
loop
|
|
|
|
check user activity - turn off computations if needed
|
|
|
|
start a computation if possible
|
|
|
|
all necessary files present,
|
|
|
|
and workunit not done or in progress.
|
|
|
|
check processes (fail, done)
|
|
|
|
start new network xfers if possible
|
|
|
|
xfer 16KB if possible (use select)
|
|
|
|
if xfer complete, update state
|
|
|
|
if estimated work below low-water mark
|
|
|
|
while estimated work below high-water mark
|
|
|
|
pick project with work due, OK dont_contact_until
|
|
|
|
contact a control server; request high-current work
|
|
|
|
if can't get connection, update dont_contact_until
|
|
|
|
end
|
|
|
|
end
|
|
|
|
end
|
|
|
|
--------------------
|
|
|
|
Application logic
|
|
|
|
--------------------
|
|
|
|
Control RPC protocol
|
|
|
|
--------------------
|
|
|
|
Web site functions
|
|
|
|
--------------------
|
|
|
|
Startup scenarios
|
|
|
|
|
|
|
|
- How a user initially signs up:
|
|
|
|
Visit the project's URL.
|
|
|
|
Create an account:
|
|
|
|
enter email address
|
|
|
|
wait for password to arrive in email.
|
|
|
|
download installer
|
|
|
|
installer installs agent, initial config file
|
|
|
|
run agent; type in password.
|
|
|
|
|
|
|
|
- How a user adds a project
|
|
|
|
Same as above, but don't download agent.
|
|
|
|
Go to "home" web site and add project.
|
|
|
|
|
|
|
|
- How a user removes a project
|
|
|
|
Go to "home" web site and remove project
|
|
|
|
|
|
|
|
------------------------------
|
|
|
|
Versions
|
|
|
|
|
|
|
|
Core client:
|
|
|
|
|
|
|
|
When and how does a scheduler tell a core agent
|
|
|
|
that a newer version can/should be downloaded?
|
|
|
|
|
|
|
|
How is compatibility between application agents
|
|
|
|
and core agents represented?
|
|
|
|
--------------------------------------
|
|
|
|
Distributed storage
|
|
|
|
|
|
|
|
Projects can use clients for storage using "sticky" files
|
|
|
|
(which are either sent to clients, or generated by the client).
|
|
|
|
|
|
|
|
The core client is free to delete sticky files any time.
|
|
|
|
|
|
|
|
Scheduler requests include a list of the sticky files held by the host.
|
|
|
|
This list is stored in a blob in the host record.
|
|
|
|
|
|
|
|
Scheduler replies can include <file_info> tags
|
|
|
|
instructing the client to download files.
|
|
|
|
These files need not be associated with applications or workunits.
|
|
|
|
|
|
|
|
Scheduler replies can include <file_info> tags
|
|
|
|
instructing the client to upload
|
|
|
|
|
2002-05-29 23:25:21 +00:00
|
|
|
--------------------------------
|
|
|
|
Preferences
|
|
|
|
|
|
|
|
CPU usage
|
|
|
|
don't run or communicate if on batteries
|
|
|
|
don't run or communicate if user is active
|
|
|
|
confirm before making network connection
|
|
|
|
minimum, maximum work buffer
|
|
|
|
|
|
|
|
Disk usage
|
|
|
|
use at most X GB
|
|
|
|
leave at least X GB free
|
|
|
|
leave at least X% free
|
|
|
|
|
|
|
|
Projects
|
|
|
|
For each project:
|
|
|
|
master URL
|
|
|
|
email address
|
|
|
|
authenticator
|
|
|
|
resource %
|
|
|
|
show email address on web site?
|
|
|
|
accept emails from project?
|
|
|
|
project-specific prefs
|