boinc

Page: Error handling (introduction)

Pages

API Implementation ATI Radeon Account managers AccountControl AccountManagement AccountManagers Adaptive Replication AdminAlphaTest AdminApprovedProjects AdminDepLibs AdminDepLibsCurl AdminDepLibsOpenSSL AdminDepLibsSqlite AdminDepLibsZlib AdminInstallerMac AdminInstallerUnix AdminInstallerWin AdminLocalize AdminReleaseAndroid AdminReleaseManagement AdminRoles AdminTasks AdminWrappers Advanced view AlphaInstructions AndroidBoinc AndroidBoincImpl AndroidBoincTesting AndroidBoincTodo AndroidBuildApp AndroidBuildClient AndroidBuildStatus AndroidGuiDiscuss Anonymous platform AppCoprocessor AppDebug AppDebugAndroid AppDebugWin AppDev AppFiltering AppIntro AppLibraries AppMultiThread AppPlan AppPlanSpec AppVersion AppVersionNew Apple Metal Support AssignedWork Assimilation introduction Assimilators in C Assimilators in scripting languages AutoFlops AutoUpdate BOINC Client BOINC Data directory BOINC Manager BOINC Security BOINC screensaver BOINC Help BOINC apps (introduction) BOINC community BOINC events BOINC overview BOINC projects BOINC software development BOINConPhones BUDA implementation BUDA job submission BUDA overview BUDA setup BackendLogic BackendPrograms BackendState BackendUtilities BadgeDoc BadgesOld BashCommandCompletion BasicApi BasicConcepts BerkeleyTasks BetaTest BlackList BoincBasics BoincContributersCall BoincDocker BoincFiles BoincGovernanceWorkingGroups BoincIntro BoincLite BoincPapers BoincPlatforms BoincPmcPage BoincPr BoincProjectsCall BoincSecurity Boinccmd tool BuildClientProcedure BuildMacApp Building BOINC on Unix Building BOINC software CamelCase CancelJobs CertSig Changes to this Wiki Choosing and joining projects Client configuration Client release notes ClientAppConfig ClientDataModel ClientFiles ClientFsm ClientLogic ClientOpaque ClientSched ClientSchedOctTen ClientSchedOld ClientSchedVersionFour ClientSetupLogicWin ClientSetupLogicWinFileLayout ClientSetupLogicWinSix ClientSetupLogicWinSixCleanup ClientSetupWinSix ClientSim CloudServer CodeSigning CodingStyle CommIntro Command line job submission CompileApp CompileAppLinux CompileAppWin CompileClient CompileWithWxWidgets CompoundApps Computation credit Computing with BOINC CondorBoinc ConferenceList Contact BOINC ContributePage Controlling BOINC remotely CoreClient CpuSched Create a BOINC server (cookbook) CreateProjectCookbook Creating a skin for the BOINC Manager Creating custom installers CreditAlt CreditGeneralized CreditNew CreditNotes CreditOptions CreditProposal CreditStats CrossProjectUserId CudaApps DataBase DataFlow DbDump DbIds DbPurge DebugClientWin DeleteFile Deploy Linux apps using VirtualBox (cookbook) DesignKeywords DesktopGrid DevMethodologies DevProcess DevProjects DevProjects_New DevQualityAssurance Development_Workflow DiagnosticsApi DirHierarchy DiskManagement Docker and WSL Docker app cookbook Docker app implementation Docker apps Docker design alternatives Download executables DownloadInfo DownloadOther DrupalConversion DrupalIntegration EastCoast08 Editing computing preferences with the BOINC Manager EmBoinc EmailChangeNotification EmailLists Error handling (cookbook) Error handling (introduction) ErrorReference Error_Abortingtask_Exceededdisklimit Error_Cantdeletepreviousstatefile Error_Givinguponupload Error_Schedulerrequestfailed ExampleApps FileCompression FileDeleter FileList FileUpload FortranApps Fossils GPU computing GPUApp GSoC_13 GdprCompliance GetFile GetFileList GitMigration Global prefs override.xml Going public GpuSched GpuSync GpuWorkFetch GraphicsApi GraphicsApiOld GraphicsApps GraphicsHtml GridIntegration GuiRpc GuiRpcProtocol GuiUrls HTMLGfx HarzPics Heat and energy considerations Home Homogeneous App Version Homogeneous Redundancy Host identification and merging HostId HostMeasurement How BOINC works HtmlOps Initialization files InstallDrupal Installing BOINC on Debian or Ubuntu Installing BOINC on EC2 Installing BOINC on Fedora Installing BOINC on Gentoo Installing BOINC on Ubuntu Installing BOINC Installing on Android Installing on FreeBSD Installing on Linux Installing on MacOS Installing on Windows IntermediateUpload JavaApps Job processing (cookbook) Job processing (introduction) Job replication JobEst JobIn JobIntro JobKeywords JobPinning JobPrioritization JobReplication JobSched JobSizeMatching JobStage JobStatus JobSubmission JobTemplates KeySetup LammpsRemote LdapSupport Linux file permissions Linux DEB and RPM support Linux installer LocalityNew LocalityScheduling Locating stolen computers LogExtension LogRotate LowLatency MacBacktrace MacBuild MacDeveloper MacDeveloperProjects Maintain your BOINC project MakeProject ManagerImpl ManagerMenus MasterUrl MediaWiki MemoryManagement Missing Linux shared libraries MpiApps MultiHost MultiSize MultiUser MultiUserPriority MysqlConfig Network related problems NetworkApps NonCpuIntensive Notifications OpenCL Applications OpenCL CPU applications OpenId OpenclCpu OptionsApi OrgGrid OtherProjectDocs PMC_Minutes PMC_Minutes_2017_12_15 PMC_Minutes_2018_01_10 PageTemplates PasswordHash PayPalDonations PerAppCredit PersFileXfer PhpDb PhysicalFileManagement PlanClassFunc PortalFeatures PowerManagement Preferences PreferencesXml Prefs2 PrefsImpl PrefsOverride PrefsPresets PrefsReference PrefsReference_Time PrefsRemodel PrefsUnification PrepareLinuxBuildMachine Process_proposals ProfileScreen ProjectConfigFile ProjectDaemons ProjectDefaults ProjectGovernance ProjectLaunch ProjectMain ProjectNews ProjectNotices ProjectOptions ProjectPapers ProjectPlan ProjectSecurity ProjectSelect ProjectSkin ProjectSpecificPrefs ProjectSponsors ProjectTasks ProofOfOwnership Proposal_ProjectSimpleAccountCreation ProtectionFromSpam Proxy servers ProxyServer PyMw PythonAppDev PythonApps PythonFramework PythonMw PythonMysql QuickStart RecentChanges Reduce_usage_of_authenticator Reduce_usage_of_authenticator_implementation ReleaseNotes RemoteInputFiles RemoteJob RemoteJobs RemoteLogs RemoteOutputFiles RemoteOverview Reporting client bugs ResearchProjects RightToErasure RpcAuth RpcPolicy RpcProtocol RpmSpec Running Linux apps on BOINC RuntimeEstimation SandBox SandboxUser SchedMatch Scientist interface ScreensaverEnhancements ScreensaverLogic SecureHttp SecurityIssues SendFile Server release notes Server trouble‐shooting ServerComponents ServerDirs ServerIntro ServerSecurity ServerStatus ServerTestInstructions ServerUpdates Simple view Simple attach usage SimpleAttach SingleJob SingleJobImpl SkinExamples SoftwareAddon SoftwareDevelopment SoftwarePrereqsUnix SoftwareTesting SolarisClient SolrIntegration Source code map SourceCode SourceCodeGit SourceCodeGit_Commands SourceCodeGit_Windows SourceCodeGit_WindowsKeygen SourceCodeGit_WorkFlow SourceCodeSvn Sporadic Applications Standard assimilators Standard validators StartTool Starting BOINC on boot (Unix) StatsXml StatusApi StolenComputers Stop or start BOINC daemon after boot StripChart StyleSheets SuperHost TeamDiscussion TeamImport Teams TemplateImages TermsOfUse The BOINC out of box experience The BOINC test drive ToolUpgrade Tools for MacOS TranslateIntro TranslateProject Translate_Coordination TranslationSystem TreeThreader TrickleApi TrickleImpl TrickleMessages TroubleshootClient TroubleshootClient_New Troubleshooting Tutorial_BOINCApplicationDevelopmentLifecycle Tutorial_DeployingVMApplications UnixClientPackage UnixProjectPackage UpdateVersions UploadStatistics Usage rules User file sandbox User manual UserJobs UserOptInConsent Using BOINC with modem, ISDN and VPN connections UsingSvn ValidationLowLevel Validators in C Validators in scripting languages Validators VboxApps Vboxwrapper release notes VersionDiff VersionHistory VersionPathSorter VirtualBox Plan VirtualBox VirtualCampusSupercomputerCenter VirtualMachines Virtualbox Shared Directories VmApps VmCompatibility VmServer Volunteer VolunteerComputing VolunteerDataArchival VolunteerRecruit VolunteerStorage WSL BOINC Image WSL apps WatchDog Weak account key WebCache WebConfig WebForum WebResources WebRpc WebSubmit WebTemplateProposal WhyUseBoinc WikiTodo WinMulticore WindowsIssues WordPressInt WorkDistribution WorkFetchMaxConcurrent WorkGeneration WorkShop07 WorkShop07_BoincGrid WorkShop07_BoincSched WorkShop07_InterpretedApps WorkShop07_PubBoincOne WorkShop07_PubBoincTwo WorkShop07_SecurityGroup WorkShop07_SimplifyApp WorkShop07_Summary WorkShop07_VirtualMachines WorkShop07_WebCode WorkShop08 WorkShop08_WorkshopProceedings WorkShop09 WorkShop09_BatchSched WorkShop09_InterprocComm WorkShop09_ScientistUsability WorkShop09_UserIssues WorkShop09_VmApps WorkShop10 WorkShop10_VmApps WorkShop10_VolunteerIssues WorkShop11 WorkShop11_HackFest WorkShop11_HackFest_Android WorkShop11_MultiUser WorkShop12 WorkShop12_WorkshopSummary WorkShop13 WorkShop13_HackfestNotes WorkShop14 WorkShop18 WorkShop19 Worker release notes WorldWideLexicon Wrapper release notes WrapperApp XaddTool XmlFormat XmlNotes XmlStats test_RunningBoinc

Table of Contents

Errors
Single-result checking
Replication

Errors

A job instance can complete successfully but produce incorrect output files:

The host's CPU or GPU malfunctions. This is rare, but it can happen on hosts that are overclocked and/or overheated.
The user 'cheats' and runs a program that, masquerading as a BOINC client, returns job results without doing any computing. This can happen in a system (like Gridcoin) that gives monetary rewards for computing.
A particular app version (say, a GPU version) may have a bug that other versions don't.

Single-result checking

In some cases it may be possible to detect incorrect results by examining the outputs of a single job instance, perhaps by

checking the syntax of the output files
checking that numerical values lie in a plausible range
checking that the results plausibly correspond to the inputs; e.g. in physical simulations, system energy is about the same.

BOINC lets you create application-specific validators that check the output files of a job. If the check fails, the job is retried (up to a limit).

Replication

Cheaters can potentially evade single-result checks. So BOINC provides another (optional) mechanism: replication. When this is used, each job is run on two different worker nodes. If the results agree, they are deemed to be correct, and one of the instances is marked as 'canonical'. If they don't agree, a third instance is created and sent to a different worker node. This continues until either a pair of agreeing instances is found, or a limit on the number of instances is reached, in which case the job is marked as failing.

Different types of CPUs and GPUs, and different math libraries, can produce slightly different floating-point results. These differences can compound, as in the 'butterfly effect'; Two equally correct results can have different numbers. The comparison of replicated jobs for such applications must be 'fuzzy'. Typically this means that corresponding numbers are allowed to differ by some (application-specific) factor.

If you use replication, your validator must also (in addition to checking single results) do a (possibly fuzzy) comparison of two instances of the same job.

Home