edits

David Anderson 2024-01-11 17:25:36 -08:00
parent 48e7f33583
commit 3a51b1e583
6 changed files with 193 additions and 186 deletions

128
BOINC-overview.md Normal file

@ -0,0 +1,128 @@
BOINC is a platform for distributed computing.
It is designed to support 'high throughput computing':
with large numbers of independent compute-intensive jobs,
and the performance goal of high rate of job completion
rather than low turnaround time of individual jobs.
It also has features to support [distributed data storage](VolunteerStorage)
and [distributed parallel computing](Sporadic-applications).
BOINC has a client/server architecture:
the 'server' distributes jobs,
while the 'client' runs on worker nodes, and execute jobs.
BOINC can use worker nodes that are:
* Heterogeneous: they have different processor and GPU types
different operating systems (Windows, Mac OS, Linux, Android).
* Sporadically available.
* Untrusted: they may return incorrect computational results.
* Large scale: millions or more worker nodes.
Hence BOINC is well-suited to [volunteer computing](VolunteerComputing)
in which the computing resources are consumer devices
(desktop and laptop computers, tablets, phones,
game consoles, appliances) volunteered by their owners.
It can also be used with
[organizational desktop resources](DesktopGrid)
(the PCs in a company or university)
or with data-center resources (clusters or clouds),
or with any combination of resources.
BOINC can run most existing HTC applications with minor modifications,
including those that use GPUs and/or multiple CPU cores.
It can use virtual machines to run unmodified Linux applications
on Windows and Mac worker nodes.
It efficiently supports applications that use large data files,
or that required large amounts of memory.
BOINC can be used as a 'back' end for existing
job-submission systems such as HTCondor; details are [here](GridIntegration).
BOINC is distributed under the LGPL v3 open-source license.
It can be used for any purpose (academic, commercial, or private)
and can be used with applications that are not open-source.
## Cost comparison
BOINC was created to provide scientists with large computing power
at a small cost.
One study found the following costs for a particular workload:
### **Use Amazon's Elastic Computing Cloud: $175 Million**
### **Build a cluster: $12.4 Million**
This includes power and air-conditioning infrastructure, network hardware, computing hardware, storage, electricity, and sysadmin personnel.
### **Use BOINC: $125,000**
Based on the average throughput and budget of several
volunteer computing projects.
It takes (very roughly) three man-months to create a volunteer computing
project using BOINC:
one month of an experienced sys admin, one month of a programmer,
and one month of a web developer.
Once the project is running,
budget a 50% FTE (mostly system admin) to maintain it.
In terms of hardware, you'll need a mid-range server computer and a fast connection to the commercial Internet.
## Organizational options
The volunteer computing projects using BOINC vary in terms of their
organizational structure and the set of scientists they serve.
Examples include:
* Research group.
The project is operated by a single research group,
and serves the members of that group.
Examples include SETI@home, Rosetta@home, and Einstein@home.
* Application-centered research community.
The project is operated by a single research group,
but serves a broader community in that science area.
Example: Climateprediction.net,
which is based at Oxford but provides computing to
researchers at other institutions.
* Science Gateway.
The project is operated by a **science gateway**,
i.e. a web site that serves a particular scientific community,
and that provides HTC as well as other functions.
An example is nanoHUB.
* Institutional umbrella project.
The project is operated by an organization (university or research lab),
and serves the researchers in that organization.
For example, LHC@home servers multiple groups at CERN.
An academic example (no longer operating)
is the University of Westminster in London.
This idea is elaborated on [here](VirtualCampusSupercomputerCenter).
* HPC provider.
The project is operated by an HPC provider such as a supercomputing center.
It processes the provider's HTC jobs
(i.e. the jobs that don't actually need a supercomputer),
and serves the provider's clients that have HTC workloads.
An example is Texas Advanced Computing Center (TACC).
There are advantages in having BOINC projects that are high
in the organizational hierarchy, and that serve many scientists:
* The cost of maintaining a BOINC project is roughly constant,
regardless of its size.
For large projects, the cost per scientist is lower.
* Publicity options: high-level organizational entities typically have
existing publicity mechanisms (e.g. alumni magazines, newsletters, etc.)
that can be leveraged to recruit volunteers.
* Longevity: the duration of one scientist's need for HTC is generally shorter
than that of a group of scientists.
There are benefits in having a project last a long time
(e.g. amortizing the startup cost).
* Continuity: similarly, one scientist's computing workload may
be sporadic, while that of a group of scientists is more continuous.
Some volunteers prefer projects with continuous workloads.

@ -1,10 +1,9 @@
# BOINC projects
A **BOINC project** is a server that distributes jobs.
Each project has a [master URL](ServerComponents#ThemasterURL), which
* identifies a web site that describes the project and shows its status.
* identifies servers that distribute jobs and collect results.
A 'BOINC project' is essentially a server that distributes jobs.
Each project has a [master URL](ServerComponents#ThemasterURL),
which exports RPCs directing the BOINC client
to servers that distribute jobs and files.
The master URL can also provide
a public web site that describes the project and shows its status.
Volunteers can create "accounts" on projects.
The BOINC client (which runs on worker nodes)
@ -12,16 +11,10 @@ can be "attached" to accounts on any number of projects.
<img src=https://github.com/BOINC/boinc/blob/master/doc/attach.jpg width=600>
Projects are independent; each one has its own applications,
databases and servers,
and is not affected by other projects.
Projects are independent; each one has its own applications, accounts,
databases and servers.
The BOINC project itself operates a web site, https://boinc.berkeley.edu.
The BOINC client periodically contacts this server to obtain
* A list of approved projects
* News of updates to the client software.
Creating projects is relatively easy.
Creating a project is relatively easy.
An organization can create multiple projects,
e.g. for testing new applications.
A project can run entirely on a single computer
@ -29,9 +22,27 @@ A project can run entirely on a single computer
A project can also be spread across multiple computers,
so that it can handle large numbers of attached clients.
## The role of UC Berkeley
BOINC itself is based at UC Berkeley.
Projects can ask to be 'vetted' by BOINC.
It operates a server at https://boinc.berkeley.edu.
This has several functions:
* It provides a web site explaining what BOINC is,
and showing a list of vetted projects.
* It provides downloads of the BOINC client for all supported platforms.
These installers are 'signed' by UC Berkeley.
* It exports the list of vetted projects.
The BOINC client periodically fetches this list
and uses it in the 'add projects' GUI dialog.
* It exports a list of the current client versions.
This is used by the BOINC client to notify volunteers
when a new version is available.
## Account managers and Science United
The original thinking was that there would be many projects,
The original assumption was that there would be many projects,
competing for computing power (i.e. volunteers) by generating
mass-media publicity and creating compelling web sites.
In practice, the need to attract volunteers has been a major

@ -1,157 +0,0 @@
# BOINC Overview
BOINC is a platform for distributed **high throughput computing**,
i.e. large numbers of independent compute-intensive jobs,
where there performance goal is high rate of job completion
rather than low turnaround time of individual jobs.
It also offers mechanisms for distributed data storage.
BOINC has a client/server architecture:
the **server** distributes jobs,
while the **client** runs on worker nodes, which execute jobs.
BOINC can be used in two ways:
* In [volunteer computing](VolunteerComputing),
the worker nodes are consumer devices (desktop and laptop computers,
tablets, smartphones) volunteered by their owners.
BOINC [addresses the various challenges](BoincIntro) inherent in this environment
(heterogeneity, host churn and unreliability, scale, security, and so on).
There are a number of volunteer-computing **BOINC projects**
such as Einstein@home, LHC@home, World Community Grid, and so on.
The BOINC client can be "attached" to one or many of these;
it processes jobs for the projects to which it is attached.
* BOINC can also be used for [in-house computing](DesktopGrid) within an organization (e.g. a company).
In this case case the worker nodes are
cluster nodes or other organizational computers,
and they are attached only to the organization's BOINC server.
BOINC can run all existing HTC applications,
including those that use GPUs and/or multiple CPU cores.
It can use virtual machines to run existing Linux applications on Windows and Mac worker nodes.
BOINC provides mechanisms for job submission and control, designed for performance at scale.
However, it can also be used as a back end for existing
job-submission systems such as HTCondor; details are [here](GridIntegration).
BOINC is distributed under the LGPL v3 open-source license.
It can be used for any purpose (academic, commercial, or private)
and can be used with applications that are not open-source.
## Cost comparison
BOINC was created to provide scientists with large computing power at a small cost.
Suppose you need, say, 100 TeraFLOPS for 1 year.
Here are some ways you can get it:
### **Use Amazon's Elastic Computing Cloud: $175 Million**
Based on $0.10 per node/hour.
### **Build a cluster: $12.4 Million**
This includes power and air-conditioning infrastructure, network hardware, computing hardware, storage, electricity, and sysadmin personnel.
### **Use BOINC: $125,000**
Based on the average throughput and budget of the 6 largest volunteer computing projects.
It takes (very roughly) three man-months to create a BOINC project:
one month of an experienced sys admin, one month of a programmer, and one month of a web developer.
Once the project is running, budget a 50% FTE (mostly system admin) to maintain it.
In terms of hardware, you'll need a mid-range server computer and a fast connection to the commercial Internet.
## Getting started
To compute using BOINC, you'll need to set up a BOINC server
and configure your applications to run under BOINC.
Technical documentation is [here](Home).
If you're doing in-house computing,
install the BOINC client on your worker nodes, and you're done.
This is detailed [here](DesktopGrid).
In the volunteer computing case, you'll need to get clients to attach to your server.
There are several ways to do this:
* Create a public-facing web site for your project.
Announce it and publicize it using whatever channels are available to you:
mass media, social media, newletters, paid advertising, etc.
* [Contact us](ProjectPeople) and ask to have your project listed by BOINC.
You'll be asked to demonstrate that a) your project is doing
what you claim it is, and b) you're following a set of security practices.
Your project will then a) be announced on the BOINC web site news column,
b) be listed on the BOINC web site, and
c) appear in the list of projects shown in the BOINC client GUI.
* [Contact us](ProjectPeople) and ask to have your project
included in [Science United](https://scienceunited.org),
a framework in which volunteers sign up for science areas instead of projects.
You'll need to tell us what types of research your project is doing,
and then you'll automatically get computing power from volunteers
who have registered an interest in those areas.
This has the advantage that you don't have to create a public-facing web site or do any publicity.
In addition, you can ask to be included in Science United even before you've created your project.
At that point we can tell you roughly how much computer power you'll get,
and you can decide whether this justifies the investment in creating a project.
These approaches are not mutually exclusive; you can do any or all of them.
## Organizational options
The volunteer computing projects using BOINC vary in terms of their
organizational structure and the set of scientists they serve.
Examples include:
* Research group.
The project is operated by a single research group,
and serves the members of that group.
Examples include SETI@home, Rosetta@home, and Einstein@home.
* Application-centered research community.
The project is operated by a single research group,
but serves a broader community in that science area.
Examples: Climateprediction.net,
which is based at Oxford but collaborates with
projects around the world.
Mindmodeling.org serves researchers from about 20 universities who use the same application (the ACT-R cognitive modeling system).
* Science Gateway.
The project is operated by a **science gateway**,
i.e. a web site that serves a particular scientific community,
and that provides HTC as well as other functions.
An example is nanoHUB.
* Institutional umbrella project.
The project is operated by an organization (university or research lab),
and serves the researchers in that organization.
For example, LHC@home servers multiple groups at CERN.
An academic example (no longer operating) is the University of Westminster in London.
This idea is elaborated on [here](VirtualCampusSupercomputerCenter).
* HPC provider.
The project is operated by an HPC provider such as a supercomputing center.
It processes the provider's HTC jobs
(i.e. the jobs that don't actually need a supercomputer),
and serves the provider's clients that have HTC workloads.
An example is Texas Advanced Computing Center (TACC).
There are several advantages in having BOINC projects that are high
in the organizational hierarchy, and that serve many scientists:
* The cost of maintaining a BOINC project is roughly constant,
regardless of its size.
For large projects, the cost per scientist is lower.
* Publicity options: high-level organizational entities typically have
existing publicity mechanisms (e.g. alumni magazines, newsletters, etc.)
that can be leveraged to recruit volunteers.
* Longevity: the duration of one scientist's need for HTC is generally shorter
than that of a group of scientists.
There are benefits in having a project last a long time
(e.g. amortizing the startup cost).
* Continuity: similarly, one scientist's computing workload may
be sporadic, while that of a group of scientists is more continuous.
Some volunteers prefer projects with continuous workloads.
So if you're thinking about using BOINC,
consider the possible scope of your project.

@ -8,9 +8,8 @@ For help with BOINC, post to the
## Introductory docs
* [BOINC overview](BoincOverview)
* [BOINC projects](ProjectsApps)
* [Features of BOINC](BoincIntro)
* [BOINC overview](BOINC-overview)
* [BOINC projects](BOINC-projects)
* [Create a BOINC server (cookbook)](Create-a-BOINC-server-(cookbook))
* [BOINC apps (introduction)](Boinc-apps-(introduction))
* [Deploy Linux apps using VirtualBox (cookbook)](Deploy-Linux-apps-using-VirtualBox-(cookbook))

@ -4,8 +4,41 @@
## Server software upgrades
## Log files
## Backups
## Get worker nodes
in-house:
If you're doing in-house computing,
install the BOINC client on your worker nodes, and you're done.
This is detailed [here](DesktopGrid).
volunteer:
Create a public-facing web site for your project.
Announce it and publicize it using whatever channels are available to you:
mass media, social media, newletters, paid advertising, etc.
## Get vetted
* [Contact us](ProjectPeople) and ask to have your project listed by BOINC.
You'll be asked to demonstrate that a) your project is doing
what you claim it is, and b) you're following a set of security practices.
Your project will then a) be announced on the BOINC web site news column,
b) be listed on the BOINC web site, and
c) appear in the list of projects shown in the BOINC client GUI.
## Science United
* [Contact us](ProjectPeople) and ask to have your project
included in [Science United](https://scienceunited.org),
a framework in which volunteers sign up for science areas instead of projects.
You'll need to tell us what types of research your project is doing,
and then you'll automatically get computing power from volunteers
who have registered an interest in those areas.
This has the advantage that you don't have to create a public-facing web site or do any publicity.
In addition, you can ask to be included in Science United even before you've created your project.
At that point we can tell you roughly how much computer power you'll get,
and you can decide whether this justifies the investment in creating a project.
## Web site
content

11
notes

@ -47,12 +47,12 @@ doc big picture
Intro docs
(base-case examples w/ cookbooks; pictures; videos where feasible)
Introduction to BOINC
Overview
What BOINC is; why use it; volunteer computing
client/server: projects, worker nodes
role of UCB
features
The structure of BOINC
Projects
projects, attachments, account managers
Science United, BOINC Central
role of UCB
@ -114,10 +114,3 @@ Intro docs
content
forums
spam control
-----------------------------
detailed docs
Handling completed jobs (assimilation)
Standard assimilators
Assimilators in scripting languages (Python, PHP, etc.)
Assimilators in C++