buda docs

David Anderson 2024-11-24 01:47:27 -08:00
parent 107f046740
commit d3fc38090c
5 changed files with 185 additions and 58 deletions

96
BUDA-implementation.md Normal file

@ -0,0 +1,96 @@
On the server, there is a single BOINC app; let's call it 'buda'.
This has app versions for the various platforms (Win, Mac, Linux)
Each app version contains the Docker wrapper built for that platform.
Each science app variant is a collection of files:
* A Dockerfile
* a config file, `job.toml`
* input and output templates
* A main program or script
* Other files
* A file 'file_list' listing the other files in template order.
The set of science apps and variants is represented
by a directory hierarchy of the form
```
project/buda_apps/
<sci_app_name>/
cpu/
... files
<plan_class>/
... files
...
...
```
Note: you can build this hierarchy manually but
typically it's maintained using a web interface; see below.
This is similar to BOINC's hierarchy of apps and app versions, except:
* It's represented in a directory hierarchy, not in database tables
* Science app variants are not associated with platforms
(since we're using Docker).
* It stores only the current version, not a sequence of versions
(that's why we call them 'variants', not 'versions').
## BUDA is not polymorphic
Conventional BOINC apps are 'polymorphic':
if an app has both CPU and GPU variants,
you submit jobs without specifying which one to use;
the BOINC scheduler makes the decision.
It would be possible to make BUDA polymorphic,
but this would be complex, requiring significant changes to the scheduler.
So - at least for now - BUDA is not polymorphic.
When you submit jobs you have to specify which plan class to use.
This could be a slight nuisance:
a plan class could have little computing power,
and you might avoid using it, but then you wouldn't get the power.
## Validators and assimilators
In the current BOINC architecture,
each BOINC app has its own validator and assimilator.
If multiple science apps "share" the same BOINC app,
we'll need a way to let them have different validators and assimilators.
This could be built on the script-based framework;
each science app could specify the names
of validator and assimilator scripts,
which would be stored in workunits.
## Interfaces
BOINC provides a web interface for managing BUDA apps
and submitting batches of jobs to them.
Other interfaces are possible;
e.g. we could make a Python-based remote API
that could be used to integrate BUDA into other batch systems.
## Implementation notes
BUDA will require changes to the scheduler.
Currently: given a job, it scans app versions,
looking for one host can accept based on plan class.
That won't work here.
The plan class is already fixed.
Instead:
* add plan_class field to workunit (or could put in xml_doc)
* if considering sending a WU to a host, and WU has a plan class
* skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
* skip if host can't handle the plan class
## If we wanted to make BUDA polymorphic
* The scheduler would have to scan the `buda_apps` dir structure (or we could add this info to the DB).
* Jobs are tagged with BUDA science app name.
* The scheduler scans versions of that science app.
* If find a plan class the host can accept, build wu.xml_doc based on BUDA app version info.
The above is possible but would be a lot of work.

70
BUDA-job-submission.me Normal file

@ -0,0 +1,70 @@
## BUDA science apps and variants
We call BUDA applications 'science apps'.
Each science app has a name, like 'worker' or 'autodock'.
A science app can have multiple 'variants' that can
use different types of computer hardware.
The name of a variant is 'cpu' if it uses a single CPU.
Otherwise it's the name of a [plan class](AppPlan).
There might be variants for 1 CPU, for N CPUs, and for various GPU types.
## User file sandbox
## Managing science apps and variants.
In the menu bar of the BOINC project's web site,
select `Computing / Job Submission`.
Then click on `BUDA`
This shows a list of existing science apps and their variants.
You can
* add or delete a variant
* add or delete a science app
* submit jobs to a variant
## Adding a variant
The form for adding a variant includes
* A plan class name (leave blank if CPU app)
* Select (from your file sandbox) a set of 'app files'. This includes:
* a Dockerfile
* a main prog to run in the container
* other files if needed
* list of input files names
* list of output files names
## Submitting jobs
The form for submitting a batch of jobs asks you to
select (from the file sandbox) a zip file of job descriptions.
This file has one dir per job:
```
jobname1/
[cmdline]
file1
file2
...
jobname2/
...
```
The file names in each job directory must match
the variant's list of input file names.
## Monitoring a batch
When you submit a batch of jobs,
you end up at a web page showing you the status of the batch.
This shows you, among other things,
how many of the jobs have completed.
Reload it to update this information.
You can click on a job to see its status
(and if it failed, the stderr output).
You can view or download its input files.
On the batch path, you can click to download a zip file
of the output files of all completed jobs.
When you're done with the batch, you can 'retire' it.
This removes its intput and output files from the server.

@ -3,18 +3,24 @@ is a framework for running Docker-based science apps on BOINC.
It's 'universal' in the sense that one BOINC app
handles arbitrary science apps.
The science app executables (and Dockerfile)
The science app's Dockerfile and executables
are in workunits rather than app versions.
On the server, there is a single BOINC app; let's call it 'buda'.
This has app versions for
the various platforms (Win, Mac, Linux)
This has app versions for the various platforms (Win, Mac, Linux)
Each app version contains the Docker wrapper built for that platform.
There are various possible interfaces for job submission to BUDA.
We could make a Python-based remote API.
We (or others) could use this API to integrate it into batch systems.
But for starters, we implemented a generic (multi-application)
web-based job submission system,
using the per-user file sandbox system.
## BUDA science apps and versions
BOINC provides server-side tools
(CLI and web interfaces) for managing BUDA science apps
BOINC provides a web interface for managing BUDA science apps
and submitting jobs to them.
These tools assume the following structure:
@ -46,7 +52,7 @@ project/buda_apps/
...
```
Note: you can build this hierarchy manually but
typically it's maintained by a job-submission system; see below.
typically it's maintained using a web interface; see below.
This is similar to BOINC's hierarchy of apps and app versions, except:
@ -58,7 +64,7 @@ This is similar to BOINC's hierarchy of apps and app versions, except:
## BUDA is not polymorphic
The existing BOINC design is 'polymorphic':
Conventional BOINC apps are 'polymorphic':
if an app has both CPU and GPU variants,
you submit jobs without specifying which one to use;
the BOINC scheduler makes the decision.
@ -99,7 +105,7 @@ Instead:
* skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
* skip if host can't handle the plan class
If we wanted to make BUDA polymorphic,
## If we wanted to make BUDA polymorphic
* The scheduler would have to scan the `buda_apps` dir structure (or we could add this info to the DB).

5
BUDA-setup.md Normal file

@ -0,0 +1,5 @@
BUDA is 'universal' in the sense that one BOINC app
handles arbitrary science apps.
The science app's Dockerfile and executables
are in workunits rather than app versions.

@ -1,50 +0,0 @@
There are various possible interfaces for job submission to BUDA.
We could make a Python-based remote API.
We (or others) could use this API to integrate it into batch systems.
But for starters, I propose implementing a generic (multi-application)
web-based job submission system,
using the per-user file sandbox system.
## Managing science apps and variants.
First, there's a web interface for managing BUDA science apps
This shows you the existing apps,
and lets you delete them or create new ones.
For a given science app it shows you the variants
(i.e. for different GPU types).
It lets you delete them or create new ones.
The form for this includes:
* Select (from your file sandbox) a set of 'app files'. This includes:
* a Dockerfile
* a main prog to run in the container
* other files if needed
* info per file: logical name, copy flag
* a plan class name (CPU/GPU)
* list of input files (logical name, copy_file)
* list of output files (logical name)
* cmdline (passed to main prog for all jobs)
## Submitting jobs
The form for submitting a batch of jobs:
* batch name (optional)
* select a BUDA science app and variant
* select (from the file sandbox) a zip file of job descriptions.
This file has one dir per job:
```
jobname/
[cmdline]
file1 (logical name)
...
...
```
This system should manage file immutability.
The above filenames are logical and don't need to be unique;
e.g. input files for different jobs can have the same name.
The system will create unique physical names.