Created The BOINC test drive (markdown)

2023-04-19 20:28:39 -07:00 · 2023-04-19 20:28:39 -07:00 · 5c8d7d29db
parent 7495ca615e
commit 5c8d7d29db
1 changed files with 218 additions and 0 deletions
--- a/The-BOINC-test-drive.md
+++ b/The-BOINC-test-drive.md
@ -0,0 +1,218 @@
+Suppose we've solved the supply side of the problem;
+BOINC has 10 million users, supplying many ExaFLOPS
+How do we get more scientists to use it?
+
+The major conference and trade show for scientific computing is Supercomputing.
+Scientists who do HTC go there.
+Suppose BOINC had a booth at SC 2022
+Scientists walk up, we give them a flyer
+What should it say?
+What "test drive" experience do we want them to have?
+
+Ideally, in 10 or 15 minutes they'd be running jobs ~100 CPUs,
+and there's be a clear path to scaling up to millions.
+
+The test drive can't include:
+
+- reading any existing BOINC doc
+- writing any XML
+- doing sysadmin
+- creating a web site
+- recruiting volunteers
+- building apps on Windows, Mac, or Android
+- developing validators or assimilators
+
+---
+First, we create a "BOINC app library".
+It includes a number of widely-used apps (like Autodock, Charm, Rosetta, etc),
+compiled to run on BOINC (w/ the BOINC library).
+For app, the library includes app versions for various platforms,
+CPU features, and GPUs.
+Each app version has an associated plan class specification.
+One of the apps is the VBox wrapper.
+
+These apps are viewed as "secure":
+running them on a computer doesn't pose a security risk,
+regardless of the input files and cmdline parameters,
+even if the job was created by a malevolent hacker.
+That means we have to be careful about what we put in the library;
+we need to build it ourselves or vet the people who build it.
+
+The app library exports a list of the app versions and their hashes.
+The BOINC client imports this list,
+so it can know if an app version is from the BOINC library.
+
+In the BOINC client,
+an attachment to a project can be marked "restricted",
+in which case the client will only run apps for that project that are
+from the app library.
+
+Notes:
+1. maintaining this library could be a lot of work!
+1. the library could be useful for other purposes;
+    e.g. we could bundle Android app versions with the BOINC Android client
+
+Second, we create a "Demo grid":
+a set of computers willing to run jobs for anyone, in restricted mode.
+Could be volunteers, or cluster nodes somewhere, or Amazon spot instances.
+The BOINC client running on these nodes is attached to
+an account manager which lets us dynamically attach them to projects.
+This may as well be an enhanced version of Science United.
+
+Third, we create a BOINC project that I'll call BOINC Central
+(the name doesn't matter, no one sees it).
+Its job is to dispatch jobs for users who don't or can't run their own BOINC server.
+It has all the apps in library, and all versions, with the plan classes set up.
+(these are the only app versions it has).
+
+Finally, we use Science United as a "switchboard" for dynamically
+attaching hosts to project.
+It knows which hosts are part of the Demo grid.
+For each project, it knows whether it is
+- unvetted
+- vetted (shallow or deep; see below)
+This info is used in deciding what projects to attach each host to.
+
+## Test-drive scenarios
+
+### unvetted/central
+```
+    goal: quickly run batches of jobs on computers you don't own
+    User experience:
+    - create an account on BOINC Central, Recaptcha, verify email address
+    2 variants:
+    1) Command line interface (Condor-like)
+        install a package
+        make a "submit file" that specifies a batch of jobs
+            - app
+            - input files
+            - cmdline params
+            - possible resource usage estimates
+        run "boinc_submit"
+        other cmdline commands to
+            - wait for competion of batch
+                (or email notification)
+            - show pending jobs (condor_q)
+            - abort jobs
+            - get resource usage of completed jobs
+                (for use in later submissions)
+            - get output files of completed jobs
+    2) Web interface: go to BOINC Central
+        pick an application
+        specify (through a web interface) a set of cmdline args
+        and/or a range of input files
+        click submit
+        email notification option
+        web interfaces for showing status, aborting
+        download output files as zip
+
+    How to implement
+    - Use BOINC Central for dispatching jobs
+        use existing job-submission and file-management RPCs
+    - Use the Demo grid;
+        SU attaches all Demo nodes to BOINC Central
+        (in restricted mode, though apps coming from there are secure).
+
+    There are limits on
+        - how much computing you get per week
+        - size of input/output files
+
+    possible variant:
+        - you can pay to get more computing
+
+    This is similar to Open Science Grid but
+        - no vetting of job submitters.
+        - has the BOINC "polymorphic app" concept
+```
+This is the "test drive" experience.
+It gives anyone - scientist or not - sporadic access to a few hundred computers.
+This may be all that some scientists need.
+
+One of the apps in the library is the VBox wrapper,
+so you can bring your own apps but they have to run in VMs.
+Use boinc2docker (and TACC's extensions) to automate converting
+any Linux/Intel app to a Docker image.
+Could also develop tools for managing a set of these images.
+(my earlier "tire-kicking" google doc describes this)
+
+Notes:
+- no result validation is done; Demo grid nodes are assumed to be reliable.
+- you don't have to specify job sizes (CPU, RAM, disk).
+    We could have a system that estimates these for you, based on past jobs
+
+## unvetted/distributed
+```
+    Similar, but user has their own BOINC server;
+        avoids storage and BW bottleneck of central server
+        Also lets you attach your own computers directly.
+    - get a Linux machine visible on Internet
+        could be Cloud node
+    - install BOINC server on that machine and create a project
+        could be from a package
+        could be BOINC server Docker
+        could be from a VM image
+    - BOINC server is a black box to user
+    - run commands to install apps from library
+    - submit jobs through same cmdline or web interface
+    - register your BOINC server with BOINC Central
+        no vetting
+        server is registered with SU as "unvetted project"
+
+    Implementation
+        Uses Demo grid hosts
+        Science United attaches Demo grid hosts to unvetted projects in restricted mode
+```
+---------------
+```
+Vetting: 
+    partially vetted: we believe that
+        - your identity and affiliation are true
+        - you're doing the kind of computing you claim
+            (science area, location)
+        This gives you access to more computing but you still need to use trusted apps
+    fully vetting: partial vetting plus
+        - we believe that your apps are not malware
+        - we believe that you do code signing
+        This lets you use your own non-VM apps
+
+Partially vetted
+    You can use either the central or distributed model.
+    Your apps run on all Science United hosts (currently about 5,000).
+
+Fully vetted
+    Use with distributed model (your own server)
+    You can add your own apps and app versions.
+        May as well use the current BOINC tools for this;
+        requires logging in to your project server,
+        code-signing, maybe writing XML plan class specs
+    Your project is registered on Science United,
+        and it's attached to hosts based on science area
+        and computing resources (that's how SU currently works)
+    Your apps run on all Science United hosts in trusted mode
+    Your project is listed on the BOINC web site,
+        and in the project list in the client GUI,
+        so volunteers can attach to it explicitly.
+
+Notes:
+- result validation becomes an issue,
+    mostly because of possible credit cheating.
+    Need to figure out how to do this in a way that doesn't require
+    users to write validators.
+
+    Or get rid of credit
+```
+--------------
+How hard is this to implement?
+```
+Things I can do:
+    BOINC library framwork
+    BOINC Central
+    Changes to SU
+    Changes to BOINC client
+
+Things I'd need help with:
+    Job submission interfaces
+
+Things others would have to do
+    build app versions for BOINC library
+```