From b853a13aa9bf2f9e6b4135e41b163a328b676de8 Mon Sep 17 00:00:00 2001 From: Vitalii Koshura Date: Sun, 2 Apr 2023 02:40:25 +0200 Subject: [PATCH] Update CreditOptions.md file Signed-off-by: Vitalii Koshura --- CreditOptions.md | 139 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 CreditOptions.md diff --git a/CreditOptions.md b/CreditOptions.md new file mode 100644 index 0000000..34200d5 --- /dev/null +++ b/CreditOptions.md @@ -0,0 +1,139 @@ +# Credit options + +"Credit" is a number associated with completed jobs, +reflecting how much floating-point computation was (or could have been) done. +For CPU applications the basic formula is: + +1 unit of credit = 1/200 day runtime on a CPU whose Whetstone benchmark is 1 GFLOPS. + +Whetstone measures peak performance, and applications that do a lot of memory or disk access get lower FLOPS. +So credit measures peak, not actual, FLOPs. + +Credit is used for two purposes: + + 1. For users, to see their rate of progress, + to compete with other users or teams, + and to compare the performance of hosts. + + 2. To get an estimate of the peak performance available to a particular project, + or of the volunteer host pool as a whole. + +For 2) we care only about averages. +For 1) we also care about parity between similar jobs; +users get upset if someone else gets a lot more credit for a similar job. + +BOINC provides 4 ways of determining credit. +The choice (per app) depends on the properties of the app: + +* If you can estimate a job's FLOPs in advance, use **pre-assigned** credit. + +* Else if you can estimate a job's FLOPs after if completes, use **post-assigned** credit. + +* Else if the app has only CPU versions, use **runtime credit**. + +* Else use **adaptive credit**. + +# Pre-assigned credit + +You can use this if the amount of computation done by each job is known in advance, +e.g. if all jobs do the same computation. +Measure the runtime on a machine with known Whetstone benchmarks. +Pick a machine with enough RAM that you're not paging. +The credit is then + +(runtime in days)*benchmark*ncpus*200 + +ncpus is the number of CPUs used by the app version; use a sequential version if possible. + +You can also use it if the runtime is a linear function of +some job attribute (e.g. input file size) that's known in advance. + +To specify: +* use the --credit argument to the create_work cmdline program +* if using the C++ API, assign wu.canonical_credit in the first argument. + +You must run the app's validator with the **--credit_from_wu** option. + +TODO: add to remote job submissions RPCs if anyone wants. + +# Post-assigned credit + +Use this if you can estimate the FLOPs done by a completed job, +based on the contents of its output files or stderr. +For example, if your app has an outer loop, +and you can measure (as above) the credit C due for each iteration, +the job credit is C times the number of iterations performed. + +To use this: +* In your validator, have the init_result() function set result.claimed_credit. +* Run the validator with **--post_assigned_credit**. + +A job's granted credit is the claimed credit of its canonical instance. + +# Runtime-based credit + +Use this if the app has only CPU app versions. +The "claimed credit" for a job instance is runtime*ncpus*peak_flops, +where peak_flops is the host's Whetstone benchmark. +The job's granted credit is the average of the instance claimed credits. + +To use this: pass the **--credit_from_runtime** option to the app's validator. +You must also supply **--max_granted_credit**. + +Runtime-based credit can't be used if the app has GPU versions +because efficiency can vary by orders of magnitude between CPU and GPU versions. + +Runtime-based credit is limited by max_granted_credit, but is otherwise not cheat-proof. + +# Runtime-based credit via trickle messages + +If you have very long-running jobs (a week or more) you may want to +grant credit incrementally. +To do so: + +* Have your application periodically send [trickle-up messages](TrickleApi) + with variety **runtime** and content +``` +X +``` + where X is the runtime since the last trickle message. + +* Run the **trickle_credit** daemon as follows: +``` +trickle_credit --variety runtime --max_runtime Y +``` +where Y is the limit on runtime +(typically the period of the trickle messages). + +* Run your validator for the app with the **--no_credit** option + +The **trickle_credit** daemon grants credit in proportion to (runtime * CPU FLOPS), +hence this approach should be used only for applications with only single-CPU versions. + +This approach is not device-neutral because hosts with the same peak FLOPS +may have different actual FLOPS for the app version. + +# Adaptive credit + +Use this if you have GPU apps, and are unable to estimate FLOPs even after job completion. +This method maintains performance statistics on a (host, app version) level, +and uses these to normalize credit between CPU and GPU versions. +See [CreditNew]. + +To use: this is the default. + +If you use this, the adaptation will happen faster if you provide +values for workunit fp_ops_est that are correlated with the actual FLOPs. +Use a constant value if you're not sure. + +# Credit averaging + +If you use replication, runtime-based and adaptive credit can produce +different "claimed credit" for each job instance. +The validator code averages these in tricky way I don't quite understand +(Kevin invented it). +It does not take the minimum. +We should probably provide this as an option, +to make runtime-based credit more cheat-proof. +However, this won't work if you use adaptive replication, +where many jobs have only one instance.