Job and Project Keywords
This document describes a framework for assigning keywords, such as science area and location, to jobs and projects. This can be used for several purposes:
- Client GUIs can show volunteers what kinds of jobs they're running, and where they're from.
- As part of an account manager that lets volunteers sign up for science areas rather than specific projects (I'm currently working on one of these).
- To show project attributes in the project list on the BOINC web site (we currently show attributes in an ad-hoc way).
- To let volunteers choose job types at a finer granularity than app.
There lots of other potential uses.
To make this work, the BOINC community needs to agree on
- A structure for the set of keywords.
- An authoritative set of keywords. I propose that the BOINC PMC be in charge of this, possibly creating a committee for this purpose.
Goals
- Keep things as simple as possible. We don't need to create the ultimate taxonomy of science.
- Make it possible to have a very simple UI for volunteer keyword preferences, e.g. a few high-level keywords with yes/no/maybe buttons.
- Make it possible to have a higher-resolution UI, e.g. research for a particular type of cancer.
- Allow the set of keywords to change over time in a coherent way.
Structure
I propose structuring keywords as follows:
Category: what property the keyword refers to; I suggest
- Science Area: what kind of research is being done.
- Location: where (continent, country, institution) the researcher is located.
- Another orthogonal attribute is ownership and accessibility of results. Some volunteers don't want to support for-profit research. But this is tricky; there are gray areas such as academic research for which a corporation has right of first refusal for licensing the results.
Level: 0, 1, 2. Level 0 is most general (e.g. 'Physics' or 'Europe').
Hierarchy: the relationship between level n and n+1 keywords. I propose a strict hierarchy: each level n+1 keyword is the child of a single level n keyword.
- Advantage: this simplifies the conceptual model and the user interface.
- Disadvantage: it can't represent, for example, that a level 1 keyword like "Gravitational waves" is associated with both "Physics" and "Astronomy". But I don't think this matters. If volunteer wants to support GW research and doesn't find it in one place, they'll look in the other.
Each keyword has
- an integer ID, which never changes, and is used to identify the keyword in job, project, and preferences lists.
- short and long textual descriptions; these can change over time. We'll figure out a way to make them translatable.
- create time, mod time, and delete time.
The list of keywords and all their properties will be exported by the BOINC web site as an XML file.
Keyword example
(not complete: just to show the idea; indentation shows level)
Science Area
Astronomy
SETI
Pulsars
Gravitational waves
Cosmology
Physics
Particle physics
Nanoscience
Biology and medicine
Drug discovery
Protein research
Genetics and phylogeny
Disease research
Diabetes
Cancer
Prostate cancer
Breast cancer
Mathematics and Computer Science
Artificial Intelligence and Cognitive Science
Location
Europe
Germany
Albert Einstein Institute for Gravitional Physics
Asia
Australia
The Americas
United States
UC Berkeley
Purdue
Project and job attributes
Each project can have a set of keywords. For each keyword there is an associated "work fraction": an estimate of the fraction of the project's work that have that keyword.
Each job can have an associated set of keywords. Note: keywords need to be at the job level, not app, because VM-based projects can use a single BOINC app for all their jobs.
If a project has a keyword with work fraction 1, that keyword is implicitly associated with all the project's jobs.
Volunteer preferences
A volunteer can specify (e.g. via an account manager) a set of "preferences", which is a map from keywords to [yes, no, maybe].
"no" means don't send jobs with that keyword.
"yes" means preferentially send jobs with that keyword.
A "no" for a level N keyword trumps "yes" for a descendant keyword.
If a project has a keyword with work fraction 1, and the volunteer has "no" for that keyword, the volunteer should not be attached to that project.
Note: instead of ternary yes/no/maybe, we could have some sort of "research share" per keyword. This would greatly complicate things; I don't think it's worth it.
Information flow
- An account manager reply can return a set of volunteer preferences, and sets of project keywords, both of which are stored by the client. They are deleted if the user detaches from the AM.
- The client includes volunteer preferences in scheduler requests.
- The job submission interfaces will be expanded to include job keywords; these will be stored in the DB result table.
- Projects can export their keywords in get_project_config.php.
- Project and job keywords will be included in GUI RPC replies, so that GUIs can show them.
Keywords and scheduling
The BOINC scheduler's score-based algorithm will be augmented with a keyword component:
- If a job has a keyword for which the volunteer has a "no" preference, the score is -1 (don't send).
- For each job keyword for which the volunteer has a "yes" preference, increment the score.
Changes over time
Keywords may be added, removed, or changed over time. In terms of volunteer preferences, what should the semantics be? E.g., suppose a new science area is added. Should prefs default to "maybe" or "no"? I propose:
- Prefs default to "maybe";
- Volunteers are informed ASAP that keywords have changed, and given a link to update their prefs accordingly.
For example: AMs that support keyword prefs can keep a timestamp of when each user updated their prefs. If the mod time of the keyword set is later than this:
- When the user visits the AM web site, they're shown a message of the form "keywords have changed - please update your prefs".
- A similar message is sent to the client as a notice.