mirror of https://github.com/BOINC/boinc.git
buda docs
parent
107f046740
commit
d3fc38090c
|
@ -0,0 +1,96 @@
|
|||
On the server, there is a single BOINC app; let's call it 'buda'.
|
||||
This has app versions for the various platforms (Win, Mac, Linux)
|
||||
Each app version contains the Docker wrapper built for that platform.
|
||||
|
||||
Each science app variant is a collection of files:
|
||||
|
||||
* A Dockerfile
|
||||
* a config file, `job.toml`
|
||||
* input and output templates
|
||||
* A main program or script
|
||||
* Other files
|
||||
* A file 'file_list' listing the other files in template order.
|
||||
|
||||
The set of science apps and variants is represented
|
||||
by a directory hierarchy of the form
|
||||
```
|
||||
project/buda_apps/
|
||||
<sci_app_name>/
|
||||
cpu/
|
||||
... files
|
||||
<plan_class>/
|
||||
... files
|
||||
...
|
||||
...
|
||||
```
|
||||
Note: you can build this hierarchy manually but
|
||||
typically it's maintained using a web interface; see below.
|
||||
|
||||
This is similar to BOINC's hierarchy of apps and app versions, except:
|
||||
|
||||
* It's represented in a directory hierarchy, not in database tables
|
||||
* Science app variants are not associated with platforms
|
||||
(since we're using Docker).
|
||||
* It stores only the current version, not a sequence of versions
|
||||
(that's why we call them 'variants', not 'versions').
|
||||
|
||||
## BUDA is not polymorphic
|
||||
|
||||
Conventional BOINC apps are 'polymorphic':
|
||||
if an app has both CPU and GPU variants,
|
||||
you submit jobs without specifying which one to use;
|
||||
the BOINC scheduler makes the decision.
|
||||
|
||||
It would be possible to make BUDA polymorphic,
|
||||
but this would be complex, requiring significant changes to the scheduler.
|
||||
So - at least for now - BUDA is not polymorphic.
|
||||
|
||||
When you submit jobs you have to specify which plan class to use.
|
||||
This could be a slight nuisance:
|
||||
a plan class could have little computing power,
|
||||
and you might avoid using it, but then you wouldn't get the power.
|
||||
|
||||
## Validators and assimilators
|
||||
|
||||
In the current BOINC architecture,
|
||||
each BOINC app has its own validator and assimilator.
|
||||
If multiple science apps "share" the same BOINC app,
|
||||
we'll need a way to let them have different validators and assimilators.
|
||||
|
||||
This could be built on the script-based framework;
|
||||
each science app could specify the names
|
||||
of validator and assimilator scripts,
|
||||
which would be stored in workunits.
|
||||
|
||||
## Interfaces
|
||||
|
||||
BOINC provides a web interface for managing BUDA apps
|
||||
and submitting batches of jobs to them.
|
||||
Other interfaces are possible;
|
||||
e.g. we could make a Python-based remote API
|
||||
that could be used to integrate BUDA into other batch systems.
|
||||
|
||||
## Implementation notes
|
||||
|
||||
BUDA will require changes to the scheduler.
|
||||
|
||||
Currently: given a job, it scans app versions,
|
||||
looking for one host can accept based on plan class.
|
||||
That won't work here.
|
||||
The plan class is already fixed.
|
||||
|
||||
Instead:
|
||||
* add plan_class field to workunit (or could put in xml_doc)
|
||||
* if considering sending a WU to a host, and WU has a plan class
|
||||
* skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
|
||||
* skip if host can't handle the plan class
|
||||
|
||||
## If we wanted to make BUDA polymorphic
|
||||
|
||||
* The scheduler would have to scan the `buda_apps` dir structure (or we could add this info to the DB).
|
||||
|
||||
* Jobs are tagged with BUDA science app name.
|
||||
* The scheduler scans versions of that science app.
|
||||
* If find a plan class the host can accept, build wu.xml_doc based on BUDA app version info.
|
||||
|
||||
The above is possible but would be a lot of work.
|
|
@ -0,0 +1,70 @@
|
|||
## BUDA science apps and variants
|
||||
|
||||
We call BUDA applications 'science apps'.
|
||||
Each science app has a name, like 'worker' or 'autodock'.
|
||||
A science app can have multiple 'variants' that can
|
||||
use different types of computer hardware.
|
||||
The name of a variant is 'cpu' if it uses a single CPU.
|
||||
Otherwise it's the name of a [plan class](AppPlan).
|
||||
There might be variants for 1 CPU, for N CPUs, and for various GPU types.
|
||||
|
||||
## User file sandbox
|
||||
|
||||
## Managing science apps and variants.
|
||||
|
||||
In the menu bar of the BOINC project's web site,
|
||||
select `Computing / Job Submission`.
|
||||
Then click on `BUDA`
|
||||
This shows a list of existing science apps and their variants.
|
||||
You can
|
||||
|
||||
* add or delete a variant
|
||||
* add or delete a science app
|
||||
* submit jobs to a variant
|
||||
|
||||
## Adding a variant
|
||||
|
||||
The form for adding a variant includes
|
||||
|
||||
* A plan class name (leave blank if CPU app)
|
||||
* Select (from your file sandbox) a set of 'app files'. This includes:
|
||||
* a Dockerfile
|
||||
* a main prog to run in the container
|
||||
* other files if needed
|
||||
* list of input files names
|
||||
* list of output files names
|
||||
|
||||
## Submitting jobs
|
||||
|
||||
The form for submitting a batch of jobs asks you to
|
||||
select (from the file sandbox) a zip file of job descriptions.
|
||||
This file has one dir per job:
|
||||
```
|
||||
jobname1/
|
||||
[cmdline]
|
||||
file1
|
||||
file2
|
||||
...
|
||||
jobname2/
|
||||
...
|
||||
```
|
||||
The file names in each job directory must match
|
||||
the variant's list of input file names.
|
||||
|
||||
## Monitoring a batch
|
||||
|
||||
When you submit a batch of jobs,
|
||||
you end up at a web page showing you the status of the batch.
|
||||
This shows you, among other things,
|
||||
how many of the jobs have completed.
|
||||
Reload it to update this information.
|
||||
|
||||
You can click on a job to see its status
|
||||
(and if it failed, the stderr output).
|
||||
You can view or download its input files.
|
||||
|
||||
On the batch path, you can click to download a zip file
|
||||
of the output files of all completed jobs.
|
||||
|
||||
When you're done with the batch, you can 'retire' it.
|
||||
This removes its intput and output files from the server.
|
|
@ -3,18 +3,24 @@ is a framework for running Docker-based science apps on BOINC.
|
|||
|
||||
It's 'universal' in the sense that one BOINC app
|
||||
handles arbitrary science apps.
|
||||
The science app executables (and Dockerfile)
|
||||
The science app's Dockerfile and executables
|
||||
are in workunits rather than app versions.
|
||||
|
||||
On the server, there is a single BOINC app; let's call it 'buda'.
|
||||
This has app versions for
|
||||
the various platforms (Win, Mac, Linux)
|
||||
This has app versions for the various platforms (Win, Mac, Linux)
|
||||
Each app version contains the Docker wrapper built for that platform.
|
||||
|
||||
There are various possible interfaces for job submission to BUDA.
|
||||
We could make a Python-based remote API.
|
||||
We (or others) could use this API to integrate it into batch systems.
|
||||
|
||||
But for starters, we implemented a generic (multi-application)
|
||||
web-based job submission system,
|
||||
using the per-user file sandbox system.
|
||||
|
||||
## BUDA science apps and versions
|
||||
|
||||
BOINC provides server-side tools
|
||||
(CLI and web interfaces) for managing BUDA science apps
|
||||
BOINC provides a web interface for managing BUDA science apps
|
||||
and submitting jobs to them.
|
||||
These tools assume the following structure:
|
||||
|
||||
|
@ -46,7 +52,7 @@ project/buda_apps/
|
|||
...
|
||||
```
|
||||
Note: you can build this hierarchy manually but
|
||||
typically it's maintained by a job-submission system; see below.
|
||||
typically it's maintained using a web interface; see below.
|
||||
|
||||
This is similar to BOINC's hierarchy of apps and app versions, except:
|
||||
|
||||
|
@ -58,7 +64,7 @@ This is similar to BOINC's hierarchy of apps and app versions, except:
|
|||
|
||||
## BUDA is not polymorphic
|
||||
|
||||
The existing BOINC design is 'polymorphic':
|
||||
Conventional BOINC apps are 'polymorphic':
|
||||
if an app has both CPU and GPU variants,
|
||||
you submit jobs without specifying which one to use;
|
||||
the BOINC scheduler makes the decision.
|
||||
|
@ -99,7 +105,7 @@ Instead:
|
|||
* skip if no app version with that platform / plan class (e.g. can't send metal job to Win host)
|
||||
* skip if host can't handle the plan class
|
||||
|
||||
If we wanted to make BUDA polymorphic,
|
||||
## If we wanted to make BUDA polymorphic
|
||||
|
||||
* The scheduler would have to scan the `buda_apps` dir structure (or we could add this info to the DB).
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
BUDA is 'universal' in the sense that one BOINC app
|
||||
handles arbitrary science apps.
|
||||
The science app's Dockerfile and executables
|
||||
are in workunits rather than app versions.
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
There are various possible interfaces for job submission to BUDA.
|
||||
We could make a Python-based remote API.
|
||||
We (or others) could use this API to integrate it into batch systems.
|
||||
|
||||
But for starters, I propose implementing a generic (multi-application)
|
||||
web-based job submission system,
|
||||
using the per-user file sandbox system.
|
||||
|
||||
## Managing science apps and variants.
|
||||
|
||||
First, there's a web interface for managing BUDA science apps
|
||||
This shows you the existing apps,
|
||||
and lets you delete them or create new ones.
|
||||
|
||||
For a given science app it shows you the variants
|
||||
(i.e. for different GPU types).
|
||||
It lets you delete them or create new ones.
|
||||
The form for this includes:
|
||||
|
||||
* Select (from your file sandbox) a set of 'app files'. This includes:
|
||||
* a Dockerfile
|
||||
* a main prog to run in the container
|
||||
* other files if needed
|
||||
* info per file: logical name, copy flag
|
||||
* a plan class name (CPU/GPU)
|
||||
* list of input files (logical name, copy_file)
|
||||
* list of output files (logical name)
|
||||
* cmdline (passed to main prog for all jobs)
|
||||
|
||||
## Submitting jobs
|
||||
|
||||
The form for submitting a batch of jobs:
|
||||
|
||||
* batch name (optional)
|
||||
* select a BUDA science app and variant
|
||||
* select (from the file sandbox) a zip file of job descriptions.
|
||||
This file has one dir per job:
|
||||
```
|
||||
jobname/
|
||||
[cmdline]
|
||||
file1 (logical name)
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
This system should manage file immutability.
|
||||
The above filenames are logical and don't need to be unique;
|
||||
e.g. input files for different jobs can have the same name.
|
||||
The system will create unique physical names.
|
||||
|
Loading…
Reference in New Issue