buda docs

2024-11-24 01:47:27 -08:00 · 2024-11-24 01:47:27 -08:00 · d3fc38090c
parent 107f046740
commit d3fc38090c
5 changed files with 185 additions and 58 deletions
--- a/BUDA-implementation.md
+++ b/BUDA-implementation.md
@ -0,0 +1,96 @@
+On the server, there is a single BOINC app; let's call it 'buda'.
+This has app versions for the various platforms (Win, Mac, Linux)
+Each app version contains the Docker wrapper built for that platform.
+
+Each science app variant is a collection of files:
+
+* A Dockerfile
+* a config file, `job.toml`
+* input and output templates
+* A main program or script
+* Other files
+* A file 'file_list' listing the other files in template order.
+
+The set of science apps and variants is represented
+by a directory hierarchy of the form
+```
+project/buda_apps/
+    <sci_app_name>/
+        cpu/
+            ... files
+        <plan_class>/
+            ... files
+        ...
+    ...
+```
+Note: you can build this hierarchy manually but
+typically it's maintained using a web interface; see below.
+
+This is similar to BOINC's hierarchy of apps and app versions, except:
+
+* It's represented in a directory hierarchy, not in database tables
+* Science app variants are not associated with platforms
+(since we're using Docker).
+* It stores only the current version, not a sequence of versions
+(that's why we call them 'variants', not 'versions').
+
+## BUDA is not polymorphic
+
+Conventional BOINC apps are 'polymorphic':
+if an app has both CPU and GPU variants,
+you submit jobs without specifying which one to use;
+the BOINC scheduler makes the decision.
+
+It would be possible to make BUDA polymorphic,
+but this would be complex, requiring significant changes to the scheduler.
+So - at least for now - BUDA is not polymorphic.
+
+When you submit jobs you have to specify which plan class to use.
+This could be a slight nuisance:
+a plan class could have little computing power,
+and you might avoid using it, but then you wouldn't get the power.
+
+## Validators and assimilators
+
+In the current BOINC architecture,
+each BOINC app has its own validator and assimilator.
+If multiple science apps "share" the same BOINC app,
+we'll need a way to let them have different validators and assimilators.
+
+This could be built on the script-based framework;
+each science app could specify the names
+of validator and assimilator scripts,
+which would be stored in workunits.
+
+## Interfaces
+
+BOINC provides a web interface for managing BUDA apps
+and submitting batches of jobs to them.
+Other interfaces are possible;
+e.g. we could make a Python-based remote API
+that could be used to integrate BUDA into other batch systems.
+
+## Implementation notes
+
+BUDA will require changes to the scheduler.
+
+Currently: given a job, it scans app versions,
+looking for one host can accept based on plan class.
+That won't work here.
+The plan class is already fixed.
+
+Instead:
+* add plan_class field to workunit (or could put in xml_doc)
+* if considering sending a WU to a host, and WU has a plan class
+    * skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
+    * skip if host can't handle the plan class
+
+## If we wanted to make BUDA polymorphic
+
+* The scheduler would have to scan the `buda_apps` dir structure (or we could add this info to the DB).
+
+* Jobs are tagged with BUDA science app name.
+* The scheduler scans versions of that science app.
+* If find a plan class the host can accept, build wu.xml_doc based on BUDA app version info.
+
+The above is possible but would be a lot of work.
--- a/BUDA-job-submission.me
+++ b/BUDA-job-submission.me
@ -0,0 +1,70 @@
+## BUDA science apps and variants
+
+We call BUDA applications 'science apps'.
+Each science app has a name, like 'worker' or 'autodock'.
+A science app can have multiple 'variants' that can
+use different types of computer hardware.
+The name of a variant is 'cpu' if it uses a single CPU.
+Otherwise it's the name of a [plan class](AppPlan).
+There might be variants for 1 CPU, for N CPUs, and for various GPU types.
+
+## User file sandbox
+
+## Managing science apps and variants.
+
+In the menu bar of the BOINC project's web site,
+select `Computing / Job Submission`.
+Then click on `BUDA`
+This shows a list of existing science apps and their variants.
+You can
+
+* add or delete a variant
+* add or delete a science app
+* submit jobs to a variant
+
+## Adding a variant
+
+The form for adding a variant includes
+
+* A plan class name (leave blank if CPU app)
+* Select (from your file sandbox) a set of 'app files'.  This includes:
+    * a Dockerfile
+    * a main prog to run in the container
+    * other files if needed
+* list of input files names
+* list of output files names
+
+## Submitting jobs
+
+The form for submitting a batch of jobs asks you to
+select (from the file sandbox) a zip file of job descriptions.
+This file has one dir per job:
+```
+jobname1/
+    [cmdline]
+    file1
+    file2
+    ...
+jobname2/
+...
+```
+The file names in each job directory must match
+the variant's list of input file names.
+
+## Monitoring a batch
+
+When you submit a batch of jobs,
+you end up at a web page showing you the status of the batch.
+This shows you, among other things,
+how many of the jobs have completed.
+Reload it to update this information.
+
+You can click on a job to see its status
+(and if it failed, the stderr output).
+You can view or download its input files.
+
+On the batch path, you can click to download a zip file
+of the output files of all completed jobs.
+
+When you're done with the batch, you can 'retire' it.
+This removes its intput and output files from the server.
--- a/Docker-universal-app.md
+++ b/Docker-universal-app.md
@ -3,18 +3,24 @@ is a framework for running Docker-based science apps on BOINC.

 It's 'universal' in the sense that one BOINC app
 handles arbitrary science apps.
-The science app executables (and Dockerfile)
+The science app's Dockerfile and executables
 are in workunits rather than app versions.

 On the server, there is a single BOINC app; let's call it 'buda'.
-This has app versions for
-the various platforms (Win, Mac, Linux)
+This has app versions for the various platforms (Win, Mac, Linux)
 Each app version contains the Docker wrapper built for that platform.

+There are various possible interfaces for job submission to BUDA.
+We could make a Python-based remote API.
+We (or others) could use this API to integrate it into batch systems.
+
+But for starters, we implemented a generic (multi-application)
+web-based job submission system,
+using the per-user file sandbox system.
+
 ## BUDA science apps and versions

-BOINC provides server-side tools
-(CLI and web interfaces) for managing BUDA science apps
+BOINC provides a web interface for managing BUDA science apps
 and submitting jobs to them.
 These tools assume the following structure:

@ -46,7 +52,7 @@ project/buda_apps/
    ...
 ```
 Note: you can build this hierarchy manually but
-typically it's maintained by a job-submission system; see below.
+typically it's maintained using a web interface; see below.

 This is similar to BOINC's hierarchy of apps and app versions, except:

@ -58,7 +64,7 @@ This is similar to BOINC's hierarchy of apps and app versions, except:

 ## BUDA is not polymorphic

-The existing BOINC design is 'polymorphic':
+Conventional BOINC apps are 'polymorphic':
 if an app has both CPU and GPU variants,
 you submit jobs without specifying which one to use;
 the BOINC scheduler makes the decision.
@ -99,7 +105,7 @@ Instead:
    * skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
    * skip if host can't handle the plan class

-If we wanted to make BUDA polymorphic,
+## If we wanted to make BUDA polymorphic

 * The scheduler would have to scan the `buda_apps` dir structure (or we could add this info to the DB).

--- a/BUDA-setup.md
+++ b/BUDA-setup.md
@ -0,0 +1,5 @@
+BUDA is 'universal' in the sense that one BOINC app
+handles arbitrary science apps.
+The science app's Dockerfile and executables
+are in workunits rather than app versions.
+
--- a/Docker-universal-app-web-interface.md
+++ b/Docker-universal-app-web-interface.md
@ -1,50 +0,0 @@
-There are various possible interfaces for job submission to BUDA.
-We could make a Python-based remote API.
-We (or others) could use this API to integrate it into batch systems.
-
-But for starters, I propose implementing a generic (multi-application)
-web-based job submission system,
-using the per-user file sandbox system.
-
-## Managing science apps and variants.
-
-First, there's a web interface for managing BUDA science apps
-This shows you the existing apps,
-and lets you delete them or create new ones.
-
-For a given science app it shows you the variants
-(i.e. for different GPU types).
-It lets you delete them or create new ones.
-The form for this includes:
-
-* Select (from your file sandbox) a set of 'app files'.  This includes:
-    * a Dockerfile
-    * a main prog to run in the container
-    * other files if needed
-    * info per file: logical name, copy flag
-* a plan class name (CPU/GPU)
-* list of input files (logical name, copy_file)
-* list of output files (logical name)
-* cmdline (passed to main prog for all jobs)
-
-## Submitting jobs
-
-The form for submitting a batch of jobs:
-
-* batch name (optional)
-* select a BUDA science app and variant
-* select (from the file sandbox) a zip file of job descriptions.
-This file has one dir per job:
-```
-jobname/
-    [cmdline]
-    file1 (logical name)
-    ...
-...
-```
-
-This system should manage file immutability.
-The above filenames are logical and don't need to be unique;
-e.g. input files for different jobs can have the same name.
-The system will create unique physical names.
-