From aae2ef17bd98b99dfa55c1a796948d4631fda128 Mon Sep 17 00:00:00 2001 From: David Anderson Date: Mon, 2 Oct 2023 15:35:13 -0700 Subject: [PATCH] Updated Sporadic Applications (markdown) --- Sporadic-Applications.md | 40 ++++++++++++++++++++++------------------ 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/Sporadic-Applications.md b/Sporadic-Applications.md index a2a7806..9499b48 100644 --- a/Sporadic-Applications.md +++ b/Sporadic-Applications.md @@ -1,16 +1,15 @@ BOINC was originally designed as a batch processing system: -you submit jobs, they run (independently of one another) -and eventually they finish. -Some potential uses of volunteer computing don't fit this model. -They may require that their apps run simultaneously, +you submit jobs, they run (independently of each other) +and eventually finish. +But some potential uses of volunteer computing don't fit this model. +They may require that their apps run simultaneously on different computers, and perhaps that they communicate directly with each other. -Examples include MPI-type parallel apps -and distributed machine learning. +Examples include MPI-type parallel computing and distributed machine learning. BOINC's 'sporadic application' mechanism is designed to support these types of systems, and to allow them to coexist with batch processing. The jobs of a sporadic app run (i.e. are present in memory) -all the time (like non-CPU-intensive jobs) +all the time, like non-CPU-intensive jobs, but compute only some of the time. Like regular apps, a sporadic app can have multiple app versions. @@ -28,7 +27,7 @@ A sporadic app is typically part of another distributed system - a 'guest system' - that exists outside of BOINC. The guest system typically has its own server that handles requests and dispatches them to 'worker nodes' (running BOINC). -Its worker nodes may communicate directly with each other - peer-to-peer - +Its worker nodes may communicate directly with each other - peer-to-peer or via a relay - as well as with the server. A sporadic job engages in conversations with both the BOINC client @@ -40,14 +39,17 @@ The client/app protocol uses the following messages: Client to app: -```DONT_COMPUTE```: you can't compute now (e.g. because resources are not available) -```COULD_COMPUTE```: you could compute if you want -```COMPUTING```: you're computing as far as I'm concerned +```DONT_COMPUTE```: the app can't compute now (e.g. because resources are not available) + +```COULD_COMPUTE```: the app could potentially compute + +```COMPUTING```: the app is computing as far as the client is concerned App to client: -```DONT_WANT_COMPUTE```: I don't want to compute now -```WANT_COMPUTE```: I want to compute +```DONT_WANT_COMPUTE```: the app doesn't want to compute now + +```WANT_COMPUTE```: the app wants to compute The protocol between the app and the guest server isn't specified. It could be based on polling from the app, @@ -78,23 +80,25 @@ The steps are: perhaps because the user has suspended computation. * The app relays this to the server; this tells the server not to send any requests. +The server can keep track of which worker nodes +are available for computing at a given point. * Eventually the user enables computing; the client relays this as a ```COULD_COMPUTE``` message to the app, and the app relays it to the server, indicating that it can now accept requests. * The server sends a request to the app, asking it to do some computing (and possibly some network communication with other workers). -* The app sends WANT_COMPUTE to the client. +* The app sends ```WANT_COMPUTE``` to the client. * The client reserves that needed computing resources -and sends COMPUTING to the app -* The app computes. When it's done, it sends DONT_WANT_COMPUTE to the client. -* The client (assuming computing is not suspended) sents COULD_COMPUTE +and sends ```COMPUTING``` to the app +* The app computes. When it's done, it sends ```DONT_WANT_COMPUTE``` to the client. +* The client (assuming computing is not suspended) sends ```COULD_COMPUTE``` It's also possible that the app must stop computing before the request is finished - for example, because the user suspends computing. In this case: -* The client sends DONT_COMPUTE to the app +* The client sends ```DONT_COMPUTE``` to the app * The app notifies the server that it can't finish the request (or it might wait before doing this, in case computing is re-enabled quickly).