Update CreditOptions.md file

Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
Vitalii Koshura 2023-04-02 02:40:25 +02:00
parent 6ce66a6c68
commit b853a13aa9
No known key found for this signature in database
GPG Key ID: CE0DB1726070A5A3
1 changed files with 139 additions and 0 deletions

139
CreditOptions.md Normal file

@ -0,0 +1,139 @@
# Credit options
"Credit" is a number associated with completed jobs,
reflecting how much floating-point computation was (or could have been) done.
For CPU applications the basic formula is:
1 unit of credit = 1/200 day runtime on a CPU whose Whetstone benchmark is 1 GFLOPS.
Whetstone measures peak performance, and applications that do a lot of memory or disk access get lower FLOPS.
So credit measures peak, not actual, FLOPs.
Credit is used for two purposes:
1. For users, to see their rate of progress,
to compete with other users or teams,
and to compare the performance of hosts.
2. To get an estimate of the peak performance available to a particular project,
or of the volunteer host pool as a whole.
For 2) we care only about averages.
For 1) we also care about parity between similar jobs;
users get upset if someone else gets a lot more credit for a similar job.
BOINC provides 4 ways of determining credit.
The choice (per app) depends on the properties of the app:
* If you can estimate a job's FLOPs in advance, use **pre-assigned** credit.
* Else if you can estimate a job's FLOPs after if completes, use **post-assigned** credit.
* Else if the app has only CPU versions, use **runtime credit**.
* Else use **adaptive credit**.
# Pre-assigned credit
You can use this if the amount of computation done by each job is known in advance,
e.g. if all jobs do the same computation.
Measure the runtime on a machine with known Whetstone benchmarks.
Pick a machine with enough RAM that you're not paging.
The credit is then
(runtime in days)*benchmark*ncpus*200
ncpus is the number of CPUs used by the app version; use a sequential version if possible.
You can also use it if the runtime is a linear function of
some job attribute (e.g. input file size) that's known in advance.
To specify:
* use the --credit argument to the create_work cmdline program
* if using the C++ API, assign wu.canonical_credit in the first argument.
You must run the app's validator with the **--credit_from_wu** option.
TODO: add to remote job submissions RPCs if anyone wants.
# Post-assigned credit
Use this if you can estimate the FLOPs done by a completed job,
based on the contents of its output files or stderr.
For example, if your app has an outer loop,
and you can measure (as above) the credit C due for each iteration,
the job credit is C times the number of iterations performed.
To use this:
* In your validator, have the init_result() function set result.claimed_credit.
* Run the validator with **--post_assigned_credit**.
A job's granted credit is the claimed credit of its canonical instance.
# Runtime-based credit
Use this if the app has only CPU app versions.
The "claimed credit" for a job instance is runtime*ncpus*peak_flops,
where peak_flops is the host's Whetstone benchmark.
The job's granted credit is the average of the instance claimed credits.
To use this: pass the **--credit_from_runtime** option to the app's validator.
You must also supply **--max_granted_credit**.
Runtime-based credit can't be used if the app has GPU versions
because efficiency can vary by orders of magnitude between CPU and GPU versions.
Runtime-based credit is limited by max_granted_credit, but is otherwise not cheat-proof.
# Runtime-based credit via trickle messages
If you have very long-running jobs (a week or more) you may want to
grant credit incrementally.
To do so:
* Have your application periodically send [trickle-up messages](TrickleApi)
with variety **runtime** and content
```
<runtime>X</runtime>
```
where X is the runtime since the last trickle message.
* Run the **trickle_credit** daemon as follows:
```
trickle_credit --variety runtime --max_runtime Y
```
where Y is the limit on runtime
(typically the period of the trickle messages).
* Run your validator for the app with the **--no_credit** option
The **trickle_credit** daemon grants credit in proportion to (runtime * CPU FLOPS),
hence this approach should be used only for applications with only single-CPU versions.
This approach is not device-neutral because hosts with the same peak FLOPS
may have different actual FLOPS for the app version.
# Adaptive credit
Use this if you have GPU apps, and are unable to estimate FLOPs even after job completion.
This method maintains performance statistics on a (host, app version) level,
and uses these to normalize credit between CPU and GPU versions.
See [CreditNew].
To use: this is the default.
If you use this, the adaptation will happen faster if you provide
values for workunit fp_ops_est that are correlated with the actual FLOPs.
Use a constant value if you're not sure.
# Credit averaging
If you use replication, runtime-based and adaptive credit can produce
different "claimed credit" for each job instance.
The validator code averages these in tricky way I don't quite understand
(Kevin invented it).
It does not take the minimum.
We should probably provide this as an option,
to make runtime-based credit more cheat-proof.
However, this won't work if you use adaptive replication,
where many jobs have only one instance.