From 3a51b1e58376ab92a6ab4b94405824f67da7eb13 Mon Sep 17 00:00:00 2001 From: David Anderson Date: Thu, 11 Jan 2024 17:25:36 -0800 Subject: [PATCH] edits --- BOINC-overview.md | 128 ++++++++++++++++++++++ ProjectsApps.md => BOINC-projects.md | 45 +++++--- BoincOverview.md | 157 --------------------------- Computing-with-BOINC.md | 5 +- Going-public.md | 33 ++++++ notes | 11 +- 6 files changed, 193 insertions(+), 186 deletions(-) create mode 100644 BOINC-overview.md rename ProjectsApps.md => BOINC-projects.md (62%) delete mode 100644 BoincOverview.md diff --git a/BOINC-overview.md b/BOINC-overview.md new file mode 100644 index 0000000..885983b --- /dev/null +++ b/BOINC-overview.md @@ -0,0 +1,128 @@ +BOINC is a platform for distributed computing. +It is designed to support 'high throughput computing': +with large numbers of independent compute-intensive jobs, +and the performance goal of high rate of job completion +rather than low turnaround time of individual jobs. +It also has features to support [distributed data storage](VolunteerStorage) +and [distributed parallel computing](Sporadic-applications). + +BOINC has a client/server architecture: +the 'server' distributes jobs, +while the 'client' runs on worker nodes, and execute jobs. +BOINC can use worker nodes that are: + +* Heterogeneous: they have different processor and GPU types +different operating systems (Windows, Mac OS, Linux, Android). + +* Sporadically available. + +* Untrusted: they may return incorrect computational results. + +* Large scale: millions or more worker nodes. + +Hence BOINC is well-suited to [volunteer computing](VolunteerComputing) +in which the computing resources are consumer devices +(desktop and laptop computers, tablets, phones, +game consoles, appliances) volunteered by their owners. + +It can also be used with +[organizational desktop resources](DesktopGrid) +(the PCs in a company or university) +or with data-center resources (clusters or clouds), +or with any combination of resources. + +BOINC can run most existing HTC applications with minor modifications, +including those that use GPUs and/or multiple CPU cores. +It can use virtual machines to run unmodified Linux applications +on Windows and Mac worker nodes. +It efficiently supports applications that use large data files, +or that required large amounts of memory. + +BOINC can be used as a 'back' end for existing +job-submission systems such as HTCondor; details are [here](GridIntegration). + +BOINC is distributed under the LGPL v3 open-source license. +It can be used for any purpose (academic, commercial, or private) +and can be used with applications that are not open-source. + +## Cost comparison + +BOINC was created to provide scientists with large computing power +at a small cost. +One study found the following costs for a particular workload: + +### **Use Amazon's Elastic Computing Cloud: $175 Million** + +### **Build a cluster: $12.4 Million** +This includes power and air-conditioning infrastructure, network hardware, computing hardware, storage, electricity, and sysadmin personnel. + +### **Use BOINC: $125,000** +Based on the average throughput and budget of several +volunteer computing projects. + +It takes (very roughly) three man-months to create a volunteer computing +project using BOINC: +one month of an experienced sys admin, one month of a programmer, +and one month of a web developer. +Once the project is running, +budget a 50% FTE (mostly system admin) to maintain it. +In terms of hardware, you'll need a mid-range server computer and a fast connection to the commercial Internet. + +## Organizational options + +The volunteer computing projects using BOINC vary in terms of their +organizational structure and the set of scientists they serve. +Examples include: + +* Research group. + The project is operated by a single research group, + and serves the members of that group. + Examples include SETI@home, Rosetta@home, and Einstein@home. + +* Application-centered research community. + The project is operated by a single research group, + but serves a broader community in that science area. + Example: Climateprediction.net, + which is based at Oxford but provides computing to + researchers at other institutions. + +* Science Gateway. + The project is operated by a **science gateway**, + i.e. a web site that serves a particular scientific community, + and that provides HTC as well as other functions. + An example is nanoHUB. + +* Institutional umbrella project. + The project is operated by an organization (university or research lab), + and serves the researchers in that organization. + For example, LHC@home servers multiple groups at CERN. + An academic example (no longer operating) + is the University of Westminster in London. + This idea is elaborated on [here](VirtualCampusSupercomputerCenter). + +* HPC provider. + The project is operated by an HPC provider such as a supercomputing center. + It processes the provider's HTC jobs + (i.e. the jobs that don't actually need a supercomputer), + and serves the provider's clients that have HTC workloads. + An example is Texas Advanced Computing Center (TACC). + +There are advantages in having BOINC projects that are high +in the organizational hierarchy, and that serve many scientists: + +* The cost of maintaining a BOINC project is roughly constant, + regardless of its size. + For large projects, the cost per scientist is lower. + +* Publicity options: high-level organizational entities typically have + existing publicity mechanisms (e.g. alumni magazines, newsletters, etc.) + that can be leveraged to recruit volunteers. + +* Longevity: the duration of one scientist's need for HTC is generally shorter + than that of a group of scientists. + There are benefits in having a project last a long time + (e.g. amortizing the startup cost). + +* Continuity: similarly, one scientist's computing workload may + be sporadic, while that of a group of scientists is more continuous. + Some volunteers prefer projects with continuous workloads. diff --git a/ProjectsApps.md b/BOINC-projects.md similarity index 62% rename from ProjectsApps.md rename to BOINC-projects.md index 5fffaa6..928e330 100644 --- a/ProjectsApps.md +++ b/BOINC-projects.md @@ -1,10 +1,9 @@ -# BOINC projects - -A **BOINC project** is a server that distributes jobs. -Each project has a [master URL](ServerComponents#ThemasterURL), which - -* identifies a web site that describes the project and shows its status. -* identifies servers that distribute jobs and collect results. +A 'BOINC project' is essentially a server that distributes jobs. +Each project has a [master URL](ServerComponents#ThemasterURL), +which exports RPCs directing the BOINC client +to servers that distribute jobs and files. +The master URL can also provide +a public web site that describes the project and shows its status. Volunteers can create "accounts" on projects. The BOINC client (which runs on worker nodes) @@ -12,16 +11,10 @@ can be "attached" to accounts on any number of projects. -Projects are independent; each one has its own applications, -databases and servers, -and is not affected by other projects. +Projects are independent; each one has its own applications, accounts, +databases and servers. -The BOINC project itself operates a web site, https://boinc.berkeley.edu. -The BOINC client periodically contacts this server to obtain -* A list of approved projects -* News of updates to the client software. - -Creating projects is relatively easy. +Creating a project is relatively easy. An organization can create multiple projects, e.g. for testing new applications. A project can run entirely on a single computer @@ -29,9 +22,27 @@ A project can run entirely on a single computer A project can also be spread across multiple computers, so that it can handle large numbers of attached clients. +## The role of UC Berkeley + +BOINC itself is based at UC Berkeley. +Projects can ask to be 'vetted' by BOINC. +It operates a server at https://boinc.berkeley.edu. +This has several functions: + +* It provides a web site explaining what BOINC is, +and showing a list of vetted projects. +* It provides downloads of the BOINC client for all supported platforms. +These installers are 'signed' by UC Berkeley. +* It exports the list of vetted projects. +The BOINC client periodically fetches this list +and uses it in the 'add projects' GUI dialog. +* It exports a list of the current client versions. +This is used by the BOINC client to notify volunteers +when a new version is available. + ## Account managers and Science United -The original thinking was that there would be many projects, +The original assumption was that there would be many projects, competing for computing power (i.e. volunteers) by generating mass-media publicity and creating compelling web sites. In practice, the need to attract volunteers has been a major diff --git a/BoincOverview.md b/BoincOverview.md deleted file mode 100644 index e7ea7f1..0000000 --- a/BoincOverview.md +++ /dev/null @@ -1,157 +0,0 @@ -# BOINC Overview - -BOINC is a platform for distributed **high throughput computing**, -i.e. large numbers of independent compute-intensive jobs, -where there performance goal is high rate of job completion -rather than low turnaround time of individual jobs. -It also offers mechanisms for distributed data storage. - -BOINC has a client/server architecture: -the **server** distributes jobs, -while the **client** runs on worker nodes, which execute jobs. - -BOINC can be used in two ways: - -* In [volunteer computing](VolunteerComputing), - the worker nodes are consumer devices (desktop and laptop computers, - tablets, smartphones) volunteered by their owners. - BOINC [addresses the various challenges](BoincIntro) inherent in this environment - (heterogeneity, host churn and unreliability, scale, security, and so on). - There are a number of volunteer-computing **BOINC projects** - such as Einstein@home, LHC@home, World Community Grid, and so on. - The BOINC client can be "attached" to one or many of these; - it processes jobs for the projects to which it is attached. - -* BOINC can also be used for [in-house computing](DesktopGrid) within an organization (e.g. a company). - In this case case the worker nodes are - cluster nodes or other organizational computers, - and they are attached only to the organization's BOINC server. - -BOINC can run all existing HTC applications, -including those that use GPUs and/or multiple CPU cores. -It can use virtual machines to run existing Linux applications on Windows and Mac worker nodes. - -BOINC provides mechanisms for job submission and control, designed for performance at scale. -However, it can also be used as a back end for existing -job-submission systems such as HTCondor; details are [here](GridIntegration). - -BOINC is distributed under the LGPL v3 open-source license. -It can be used for any purpose (academic, commercial, or private) -and can be used with applications that are not open-source. - -## Cost comparison - -BOINC was created to provide scientists with large computing power at a small cost. -Suppose you need, say, 100 TeraFLOPS for 1 year. -Here are some ways you can get it: - -### **Use Amazon's Elastic Computing Cloud: $175 Million** -Based on $0.10 per node/hour. -### **Build a cluster: $12.4 Million** -This includes power and air-conditioning infrastructure, network hardware, computing hardware, storage, electricity, and sysadmin personnel. -### **Use BOINC: $125,000** -Based on the average throughput and budget of the 6 largest volunteer computing projects. - -It takes (very roughly) three man-months to create a BOINC project: -one month of an experienced sys admin, one month of a programmer, and one month of a web developer. -Once the project is running, budget a 50% FTE (mostly system admin) to maintain it. -In terms of hardware, you'll need a mid-range server computer and a fast connection to the commercial Internet. - -## Getting started - -To compute using BOINC, you'll need to set up a BOINC server -and configure your applications to run under BOINC. -Technical documentation is [here](Home). - -If you're doing in-house computing, -install the BOINC client on your worker nodes, and you're done. -This is detailed [here](DesktopGrid). - -In the volunteer computing case, you'll need to get clients to attach to your server. -There are several ways to do this: - -* Create a public-facing web site for your project. - Announce it and publicize it using whatever channels are available to you: - mass media, social media, newletters, paid advertising, etc. - -* [Contact us](ProjectPeople) and ask to have your project listed by BOINC. - You'll be asked to demonstrate that a) your project is doing - what you claim it is, and b) you're following a set of security practices. - Your project will then a) be announced on the BOINC web site news column, - b) be listed on the BOINC web site, and - c) appear in the list of projects shown in the BOINC client GUI. - -* [Contact us](ProjectPeople) and ask to have your project - included in [Science United](https://scienceunited.org), - a framework in which volunteers sign up for science areas instead of projects. - You'll need to tell us what types of research your project is doing, - and then you'll automatically get computing power from volunteers - who have registered an interest in those areas. - This has the advantage that you don't have to create a public-facing web site or do any publicity. - In addition, you can ask to be included in Science United even before you've created your project. - At that point we can tell you roughly how much computer power you'll get, - and you can decide whether this justifies the investment in creating a project. - -These approaches are not mutually exclusive; you can do any or all of them. - -## Organizational options - -The volunteer computing projects using BOINC vary in terms of their -organizational structure and the set of scientists they serve. -Examples include: - -* Research group. - The project is operated by a single research group, - and serves the members of that group. - Examples include SETI@home, Rosetta@home, and Einstein@home. - -* Application-centered research community. - The project is operated by a single research group, - but serves a broader community in that science area. - Examples: Climateprediction.net, - which is based at Oxford but collaborates with - projects around the world. - Mindmodeling.org serves researchers from about 20 universities who use the same application (the ACT-R cognitive modeling system). - -* Science Gateway. - The project is operated by a **science gateway**, - i.e. a web site that serves a particular scientific community, - and that provides HTC as well as other functions. - An example is nanoHUB. - -* Institutional umbrella project. - The project is operated by an organization (university or research lab), - and serves the researchers in that organization. - For example, LHC@home servers multiple groups at CERN. - An academic example (no longer operating) is the University of Westminster in London. - This idea is elaborated on [here](VirtualCampusSupercomputerCenter). - -* HPC provider. - The project is operated by an HPC provider such as a supercomputing center. - It processes the provider's HTC jobs - (i.e. the jobs that don't actually need a supercomputer), - and serves the provider's clients that have HTC workloads. - An example is Texas Advanced Computing Center (TACC). - -There are several advantages in having BOINC projects that are high -in the organizational hierarchy, and that serve many scientists: - -* The cost of maintaining a BOINC project is roughly constant, - regardless of its size. - For large projects, the cost per scientist is lower. - -* Publicity options: high-level organizational entities typically have - existing publicity mechanisms (e.g. alumni magazines, newsletters, etc.) - that can be leveraged to recruit volunteers. - -* Longevity: the duration of one scientist's need for HTC is generally shorter - than that of a group of scientists. - There are benefits in having a project last a long time - (e.g. amortizing the startup cost). - -* Continuity: similarly, one scientist's computing workload may - be sporadic, while that of a group of scientists is more continuous. - Some volunteers prefer projects with continuous workloads. - -So if you're thinking about using BOINC, -consider the possible scope of your project. diff --git a/Computing-with-BOINC.md b/Computing-with-BOINC.md index 5988010..03a9d13 100644 --- a/Computing-with-BOINC.md +++ b/Computing-with-BOINC.md @@ -8,9 +8,8 @@ For help with BOINC, post to the ## Introductory docs -* [BOINC overview](BoincOverview) -* [BOINC projects](ProjectsApps) -* [Features of BOINC](BoincIntro) +* [BOINC overview](BOINC-overview) +* [BOINC projects](BOINC-projects) * [Create a BOINC server (cookbook)](Create-a-BOINC-server-(cookbook)) * [BOINC apps (introduction)](Boinc-apps-(introduction)) * [Deploy Linux apps using VirtualBox (cookbook)](Deploy-Linux-apps-using-VirtualBox-(cookbook)) diff --git a/Going-public.md b/Going-public.md index d2cd7ad..3787b89 100644 --- a/Going-public.md +++ b/Going-public.md @@ -4,8 +4,41 @@ ## Server software upgrades ## Log files ## Backups +## Get worker nodes + +in-house: + +If you're doing in-house computing, +install the BOINC client on your worker nodes, and you're done. +This is detailed [here](DesktopGrid). + + +volunteer: +Create a public-facing web site for your project. +Announce it and publicize it using whatever channels are available to you: +mass media, social media, newletters, paid advertising, etc. + + ## Get vetted +* [Contact us](ProjectPeople) and ask to have your project listed by BOINC. +You'll be asked to demonstrate that a) your project is doing +what you claim it is, and b) you're following a set of security practices. +Your project will then a) be announced on the BOINC web site news column, +b) be listed on the BOINC web site, and +c) appear in the list of projects shown in the BOINC client GUI. + ## Science United +* [Contact us](ProjectPeople) and ask to have your project + included in [Science United](https://scienceunited.org), + a framework in which volunteers sign up for science areas instead of projects. + You'll need to tell us what types of research your project is doing, + and then you'll automatically get computing power from volunteers + who have registered an interest in those areas. + This has the advantage that you don't have to create a public-facing web site or do any publicity. + In addition, you can ask to be included in Science United even before you've created your project. + At that point we can tell you roughly how much computer power you'll get, + and you can decide whether this justifies the investment in creating a project. + ## Web site content diff --git a/notes b/notes index e2478cf..b3c8af8 100644 --- a/notes +++ b/notes @@ -47,12 +47,12 @@ doc big picture Intro docs (base-case examples w/ cookbooks; pictures; videos where feasible) - Introduction to BOINC + Overview What BOINC is; why use it; volunteer computing client/server: projects, worker nodes role of UCB features - The structure of BOINC + Projects projects, attachments, account managers Science United, BOINC Central role of UCB @@ -114,10 +114,3 @@ Intro docs content forums spam control ------------------------------ -detailed docs -Handling completed jobs (assimilation) - Standard assimilators - Assimilators in scripting languages (Python, PHP, etc.) - Assimilators in C++ -