Contents
This section is going to describe what the output of a crash dump looks like and what it contains.
This area provides some basic information about the version of the BOINC debugger being used, when the crash occured, and what the internal version of the Windows debugger technology is being used.
Symbol search paths are used to inform the debugger where it might be able to find the symbol files related to the modules loaded in memory. Entries prefixed with 'srv*' are used to denote a web based symbol store. DbgHelp will use them if symsrv can be loaded at the time of the crash.
If you see a load library failure for either dbghelp.dll or symsrv.dll then there is a pretty good chance that most of the data in the dump will be useless.
Information about which modules were loaded into the processes memory space can be found here. The first hexadecimal value is the address in memory in which the module was loaded; the second hexadecimal is the size of the module.
If a version record was found inside the module, it'll be dumped out as part of the module list dump.
If the correct symbol file can be found, it'll display what symbol file type is in use. The three most common symbol types in modern software are 'PDB', 'exports', and '-no symbols-'. Symbol files are pretty large and so most projects do not like to include them as part of the APP VERSION package. Luckily Microsoft has created a technology called Symbol Stores which enable an application to be able to grab its symbol file from a web server at the time of a crash in a compressed format. We will describe setting up a symbol store later in this document.
PDB files are generated at compilation time and usually have to be turned on for release builds. This file will contain all the needed information to generate a pretty good callstack which you can use to diagnose problems.
Export symbols usually only appear on DLLs since DLLs can export function pointers via the export table. When you see this in the module list you'll only see functions which are listed in the export table in the callstack.
No symbols means that the runtime debugger could not determine a way to give you any symbolic information. You'll only receive function pointers in the callstack.
This is some overall useful information about the process. Most of the time the 'Virtual Memory', 'Pagefile', and 'Working Set' are the most useful indications of whether or not the process was under low available memory pressure from the OS.
This identifies the thread for which additional information is going to be displayed. Both the thread name and thread ID are displayed. To set the thread name for any thread you have created in your program just call diagnostics_set_thread_name() as defined in diagnostics.h to set the thread name for the currently executing thread.
Status shows what state the thread was in when the snapshot was taken. If the thread is waiting, wait reason will describe why the thread is waiting. If the thread is running both the base thread priority and current thread priority will be displayed.
Kernel time, user time, and wait time describe how much time, in nanoseconds, the thread has spent in each of those states.
This section if included in the thread describes what event ocurred that caused the runtime debugger to engage. Structured Exceptions in Windows are not the same thing as C++ exceptions. Unless you are using a compiler that knows about both types it is unlikely that a C++ catch is going to actually catch this type of exception.
Further information about Structured Exception Handling can be found here.
It is important to note that both hardware and software exceptions can bubble up from the operating system through this mechinism.
The example above shows that EXCEPTION_BREAKPOINT(PlatformSDK\\Include\\winbase.h) was raised at 0x7C822583. EXCEPTION_BREAKPOINT is defined as STATUS_BREAKPOINT(PlatformSDK\\Include\\ntstatus.h) which is defined as ((NTSTATUS)0x80000003L).
This is a basic dump of the processor registers at the time the exception was raised and will look different for each process type.
In this example these are the registers and flags for the Intel based x86 processor.
This describes the state in which the thread was in at the time of the exception. ChildEBP and RetAddr are not really useful unless you can reproduce the issue using the same OS version.
Args to Child are the first four parameters passed to the function.
The next piece of information has the following format:
<Module Name>!<Function Name>@<Function Ordinal>+<Symbol Offset> <File/Line Information>
This feature is disabled by default.
What is allows for is capturing the debugger viewport data at runtime just as though you were running the application within a debugger. Since all applications use the same block of memory is can slow down any and all applications that want to write to the debugger viewport, even on release builds which is why it is disabled by default. Video capture, edit, and playback software tends to dump data to the viewport even when running a release build.
The following regedit script demonstrates how to enable the debug message dump: "; block_start(); echo " Windows Registry Editor Version 5.00 [HKEY_CURRENT_USER\Software\Space Sciences Laboratory, U.C. Berkeley\BOINC Diagnostics] \"CaptureMessages\"=dword:00000001 "; block_end(); echo " To disable able the message capture use this regedit script: "; block_start(); echo " Windows Registry Editor Version 5.00 [HKEY_CURRENT_USER\Software\Space Sciences Laboratory, U.C. Berkeley\BOINC Diagnostics] \"CaptureMessages\"=dword:00000000 "; block_end(); echo "
This shows which window has the user input focus. The feature was originally meant to detect potiential problems with 3rd party application injecting code into BOINC applications and displaying UI to the user.
This feature turns out to be problematic since the foreground window might be hung which would mean that trying to get the window name and class would cause the runtime debugger to hang as well. This feature will probably be removed in the future.
In order to obtain useful diagnostic information in the event of an application crash, it is necessary to dump a callstack and any other relevant information about what was going on at the time of the crash. Symbols are only needed during a crash event, therefore they are stripped from most applications to cut down on the binary size and bandwidth requirements to deploy a new release.
Without symbols, callstacks tend to be nothing more than a list of function pointers in memory. A developer has to load the un-stripped executable in memory using the same operating system and similar processor to jump to that memory address in order to determine the function name and parameters. This is very labor intensive and generally not a very fun job.
Microsoft created a technology called a 'Symbol Store' to use with their debugger technology which allows Windows debuggers to locate and download compressed symbol files to diagnose problems and convert function pointers into human readable text. This greatly speeds up the process of diagnosing and fixing bugs.
With the BOINC Runtime Debugger for Windows framework a project can publish their symbol files and only have to distribute the application to each of the BOINC clients. When a crash event occurs the runtime framework will download the symbol file from the symbol store and then proceed to dump as much diagnostic information as possible to help projects diagnose the failure.
You'll need the latest stable release of the Debugging Tools for Windows.
Verify that your executable is setup to generate PDB debugging symbols for a release build.
Verify that the advance linker option to generate a checksum is enabled for a release build.
You'll need to explictly name both your EXE and PDB before compilation since the debugger bases the name of the PDB file off of information that is stored in the executable header.
Specifying a project wide symbol store is as easy as adding the symstore element to your config.xml file for the project.
Below is an XML shred with an example symstore element.
". html_text("
Symstore is a utility to manage symbol stores. You'll want to create a local symbol store on your Windows build machine in which you'll initially add new symbol files with each revision of your application.
Symstore will compress the symbol file and then copy it into your local symbol store.
Below is an example command which you can run from the Windows command line or cygwin command line.
"; block_start(); echo " symstore.exe add /l /f c:\SampleSrc\*.pdb /s c:\symstore /compress /t \"Sample\" /v \"5.02\" /o /c \"Application Release\" "; block_end(); echo "
Most projects tend to use scp to copy files between Windows machines and their project server.
The example below copies the entire symstore to the target location. After the copy operation you can delete all the subdirectories except '000Admin' to save time uploading for future application symbols.
"; block_start(); echo " pscp.exe -r -C -batch c:\symstore sample@project.example.com:projects/sample/html/user/symstore "; block_end(); echo "
In this section we'll list a few things to look for when reading the dumps. Please keep in mind that every application is different, but there should be enough similiarity that you should be able to figure something out.
This kind of error is an intentional error. Somewhere in the code base it encountered a breakpoint.
The callstack will point to the function that started the call to the breakpoint function.
To add manual breakpoints to your code for diagnostics purposes you can call the Windows API: "; block_start(); echo " void DebugBreak( void ); "; block_end(); echo "
Starting with Visual Studio 2005, Microsoft re-vamped the whole C Runtime Library. Part of the re-vamp process was to do parameter checking on each function. Places that would normally return a NULL value now cause a structured exception to be thrown.
The nature of this structed exception is different than most as they specifically coded it so that it will not engage the BOINC Runtime Debugger and it'll display a dialog box asking the user if they wish to debug the error. If the user cancels the error code 0xc000000d is returned without any more information.
To get more information with this error you'll need to create a function like this: "; block_start(); echo " #ifdef _WIN32 void AppInvalidParameterHandler(const wchar_t* expression, const wchar_t* function, const wchar_t* file, unsigned int line, uintptr_t pReserved ) { fprintf( stderr, \"Invalid parameter detected in function %s. File: %s Line: %d\\n\", function, file, line ); fprintf( stderr, \"Expression: %s\\n\", expression ); // Cause a Debug Breakpoint. DebugBreak(); } #endif "; block_end(); echo "
The following code block should be added after the call to boinc_diagnostics_init(): "; block_start(); echo " #ifdef _WIN32 // Every once and awhile something looks at a std::vector or some other // CRT/STL construct that throws an exception when an invalid parameter // is detected. In this case we should dump whatever information we // can and then bail. When we bail we should dump as much data as // possible. _set_invalid_parameter_handler(AppInvalidParameterHandler); #endif "; block_end(); echo "
When this issues happens in the future it'll describe which CRT function call was passed an invalid parameter and it should dump out the callstack for all threads.
The function blocks above overwrite the default behavior of the CRT when an invalid parameter is detected.
In this example it appears the processor took exception to the fact that a user mode process attempted to push a kernel mode address onto the stack without first switching to kernel mode.
Look at the EBP register, 'ffffffff' when converted into a signed int is equal to '-1' and when converted to an unsigned int it is equal to 4GB. On Windows anything above 2GB is considered a kernel mode address. If the Windows machine supports PAE and the /3GB boot option is specified in BOOT.INI then kernel addresses will start at 3GB instead.
What has probably happened here is that a function is about to be called and a 'push EBP' instruction was called to push a new address onto the stack, the CPU threw the exception since the address was outside user mode land. EBP should have had a similar progression as all the other stack frames ChildEBP values.
If EBP had some random kernel mode address it would be pretty easy to dismiss this as a CPU overheating. 'ffffffff' begs the question is the stack being overwritten by an error result from another function?
Investigation of this issue is still ongoing.
An application will throw this exception when one of it's threads exceed the 1MB stack size allotment.
In the above example the new opterator was being requested to allocate 4GB of memory. An out of memory exception was thrown.
"; page_tail(); ?>