Monday, November 21, 2011

Stress Testing a CPU to Detect Hardware Failure

From time to time a system issue is discovered and attributed to a possible CPU issue. In the Windows world, this often presents in the form of two blue screen stop codes, 0x124 WHEA_UNCORRECTABLE_ERROR and 0x101 CLOCK_WATCHDOG_TIMEOUT. These errors often indicate a problem with the CPU (and occasionally the BIOS) and the next step after identification of the error is validating that the error is due to a failing processor. This is done by stress testing the processor to see if an error can be reproduced.

One of the better tools for stress testing a processor is the Mersenne prime number search tool, known as prime95. Download the appropriate tool for your platform (you do not actually need to create an account). Extract the prime95.exe executable and launch it. When prompted, select "Just Stress Testing,"



The "blend" torture test is sufficient for most purposes.



After starting the test, prime95 will report any errors encountered and a blue screen during stress testing may also indicate failure. If the blue screen is tied to a single core, it may be possible to disable the problem core through the BIOS and re-run the test successfully (but the system will not have the full number of processor cores being used). Otherwise, it is a good idea to replace the processor that is encountering problems.

See Also,

0x124 WHEA_UNCORRECTABLE_ERROR - WinDbg/KD - Debugging a Processor Cache Issue
Identifying Cooling Issues
Troubleshooting Memory Errors
General Windows Crash Dump Analysis
Stress Testing a Video Card

Have an idea for something that you'd like to see explored? Leave a comment or send an e-mail to razorbackx_at_gmail<dot>com





No comments:

Post a Comment