http://mikemstech.blogspot.com/2011/11/windows-crash-dump-analysis.html
I recently helped a user out with a stop error involving the processor cache and I realized that there are not a lot of posts that detail the information that is included in this kind of small memory dump. Professionals who work intimately with kernel level structures and the physical processor and chipset know that there are typically two or three caches (referred to as L1, L2, and L3 caches) as well as the TLB (table lookaside buffer) cache.
When there is a failure in one of these caches, there will likely be a stop error (sometimes called a bugcheck error, after the function that generates the dump and safely brings down the system, KeBugCheckEx) and a resulting memory dump in C:\Windows\Minidump. These files end in .dmp and can be read with a couple of utilities. I use the Debugging Tools for Windows to view these files. Note that after installing the Debugging Tools for Windows, it may be necessary to configure symbols for the debuggers. In WinDbg this is done from the File -> Symbol File Path menu item. Using the linked article, it is possible to use the Microsoft symbol server to get all of the necessary symbols for the OS and to use the generated .pdb files for custom projects to load the necessary symbols for debugging custom applications.
After all of the initial setup tasks, starting WinDbg from the start menu is a simple task. Loading a dump file can be accomplished by pressing Ctrl+D or from the file menu using the "Open Crash Dump" command.
Since cache failures are usually detected as hardware errors, the error code 0x00000124 (WHEA_UNCORRECTABLE_ERROR) is the stop code that is displayed when the system crashes and the small memory dump is created. This error only appears on Windows Vista and later (Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2). Older Windows versions (Windows XP, Windows Server 2003) crash with 0x9C MACHINE_CHECK_EXCEPTION.
The universal way to start debugging a crash dump is with the !analyze -v command. This displays key information about the process that likely caused the fault, the stack trace leading up to the crash, and key information about the error. When I look at these types of errors, I also use the !cpuinfo command to get information about the processor(s) involved with the crash.
The !cpuinfo extension command can help identify the failing processor on a multicore/multi-CPU system, but successful interpretation of the output depends on vendor documentation and how the kernel interacts with the hardware. The main value in the command is that someone interpreting the dump can use the information to help identify the processor and propose updated drivers to try before replacing the CPU. The F/M/S is the Family/Model/Stepping information for the processor. This can usually be used to identify the processor in use. In this case, this is a Family 15 Model 107 Stepping 2 64-bit processor manufactured by AMD (likely the AMD Athlon Dual Core Processor 5050e).
Once it has been identified as a WHEA_UNCORRECTABLE_ERROR, it is fairly simple to see that Arg2 is a pointer to the WHEA_ERROR_RECORD structure describing the nature of the error. This can be further analyzed by using the errrec address command where address is the address denoted by Arg2. It is simple to see from Section 0 that this was a failure during a read operation of the L1 processor cache.
This error does not always indicate a failure in the processor, but can also be caused by problems in the BIOS, so before sending the CPU back to the manufacturer or purchasing a replacement, always ensure that all of the system drivers and the BIOS are up to date. You should perform a stress test on the CPU to help determine whether a hardware issue exists. For more information, see this post.
Loading Dump File
[C:\Users\Administrator\Documents\Dumps\072910-21078-01\072910-21078-01.dmp] Mini Kernel Dump File: Only registers and stack trace are available Symbol search path is: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows 7 Kernel Version 7600 MP (2 procs) Free x64 Product: WinNt, suite: TerminalServer SingleUserTS Personal Built by: 7600.16539.amd64fre.win7_gdr.100226-1909 Machine Name: Kernel base = 0xfffff800`02a13000 PsLoadedModuleList = 0xfffff800`02c50e50 Debug session time: Thu Jul 29 17:38:35.915 2010 (UTC - 6:00) System Uptime: 0 days 20:28:58.649 Loading Kernel Symbols ............................................................... ................................................................ .................. Loading User Symbols Loading unloaded module list ..... ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck 124, {0, fffffa8004b0f038, b6204000, 135} Probably caused by : hardware Followup: MachineOwner --------- 0: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the WHEA_ERROR_RECORD structure that describes the error conditon. Arguments: Arg1: 0000000000000000, Machine Check Exception Arg2: fffffa8004b0f038, Address of the WHEA_ERROR_RECORD structure. Arg3: 00000000b6204000, High order 32-bits of the MCi_STATUS value. Arg4: 0000000000000135, Low order 32-bits of the MCi_STATUS value. Debugging Details: ------------------ BUGCHECK_STR: 0x124_AuthenticAMD CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT PROCESS_NAME: Wow.exe CURRENT_IRQL: f STACK_TEXT: ... : nt!KeBugCheckEx ... : hal!HalBugCheckSystem+0x1e3 ... : nt!WheaReportHwError+0x263 ... : hal!HalpMcaReportError+0x4c ... : hal!HalpMceHandler+0x9e ... : hal!HalHandleMcheck+0x47 ... : nt!KxMcheckAbort+0x6c ... : nt!KiMcheckAbort+0x153 ... : 0x698d668e STACK_COMMAND: kb FOLLOWUP_NAME: MachineOwner MODULE_NAME: hardware IMAGE_NAME: hardware DEBUG_FLR_IMAGE_TIMESTAMP: 0 FAILURE_BUCKET_ID: X64_0x124_AuthenticAMD_PROCESSOR_CACHE BUCKET_ID: X64_0x124_AuthenticAMD_PROCESSOR_CACHE Followup: MachineOwner --------- 0: kd> !cpuinfo CP F/M/S Manufacturer MHz PRCB Signature MSR 8B Signature Features 0 15,107,2 AuthenticAMD 3114 0000000000000000 203b7dfe 0: kd> !errrec fffffa8004b0f038 =============================================================================== Common Platform Error Record @ fffffa8004b0f038 ------------------------------------------------------------------------------- Record Id : 01cb2ecb7afdac79 Severity : Fatal (1) Length : 928 Creator : Microsoft Notify Type : Machine Check Exception Timestamp : 7/29/2010 23:38:35 Flags : 0x00000000 =============================================================================== Section 0 : Processor Generic ------------------------------------------------------------------------------- Descriptor @ fffffa8004b0f0b8 Section @ fffffa8004b0f190 Offset : 344 Length : 192 Flags : 0x00000001 Primary Severity : Fatal Proc. Type : x86/x64 Instr. Set : x64 Error Type : Cache error Operation : Data Read Flags : 0x00 Level : 1 CPU Version : 0x0000000000060fb2 Processor ID : 0x0000000000000000 =============================================================================== Section 1 : x86/x64 Processor Specific ------------------------------------------------------------------------------- Descriptor @ fffffa8004b0f100 Section @ fffffa8004b0f250 Offset : 536 Length : 128 Flags : 0x00000000 Severity : Fatal Local APIC Id : 0x0000000000000000 CPU Id : b2 0f 06 00 00 08 02 00 - 01 20 00 00 ff fb 8b 17 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 Proc. Info 0 @ fffffa8004b0f250 =============================================================================== Section 2 : x86/x64 MCA ------------------------------------------------------------------------------- Descriptor @ fffffa8004b0f148 Section @ fffffa8004b0f2d0 Offset : 664 Length : 264 Flags : 0x00000000 Severity : Fatal Error : DCACHEL1_DRD_ERR (Proc 0 Bank 0) Status : 0xb620400000000135 Address : 0x0000000063c20ef0 Misc. : 0x0000000000000000
I also came across this last June and you are right that there were not many posts that detail the information that was needed.But I appreciate you for sharing it as it might help others who need.
ReplyDeletedigital certificates