Wednesday, November 16, 2011

Troubleshooting Memory Errors

Memory issues occur from time to time in many systems and present themselves in a variety of ways. Issues can present themselves in a wide range of scenarios, including the corruption of a file or database to the corruption of an entire system, blue screen of death (BSOD), or a kernel panic on non-Windows systems. Memory errors can be difficult to track because they may only occur infrequently and there may be no factors that can lead to reproduction of an issue in a support situation.

When a memory issue is suspected, it is best to use a tool such as memtest86+ or the Windows Memory Diagnostic tool to see if the error can be identified. Memtest86+ can be downloaded from http://www.memtest.org. This tool can be installed onto a USB drive or onto a blank CD. After the system boots from the USB drive or CD, the tests begin:



The tests continue and errors are either identified or not.

Windows Memory Diagnostic works similarly to memtest86+ and can be accessed from the administrative tools folder (in the control panel) on Windows Server 2008 R2 and Windows 7. For other versions it can be downloaded here from the windiag site. It is important to use an extended test. When the system boots into windiag and starts testing the RAM, press F1 (Options) and select "Extended" and press F10 to apply.



For those who are in a time crunch, either the basic or standard tests can be used. The following table describes each mode (this is taken from the Windiag tool directly).

Test Suite Description
Basic The Basic Tests are MATS+, INVC, and SCHCKR (cache enabled)
Standard The Standard tests include all of the Basic tests, plus LRAND, Stride6 (cache enabled), CHCKR3, WMATS+, and WINVC.
Extended The Extended tests include all the Standard tests plus MATS+ (cache disabled), Stride38, WSCHCKR, WStride-6, CHCKR4, WCHCKR3, ERAND, Stride6 (cache disabled) and CHCKR8.

If one of these tools identifies a memory issue, the DIMM(s) reported may not actually be bad. Follow these steps to see if the problem moves or goes away,
  1. Disable any overclocking that has been performed on the system.
  2. Re-seat all of the DIMMs (disconnect and reconnect them to the motherboard)
  3. Try memory tests with each DIMM separately (this will likely identify the bad DIMM if there is one)
  4. Using documentation for the motherboard, adjust the positions of the DIMMs to different DIMM slots
  5. Try moving a DIMM that is known to be good and test each DIMM slot on the motherboard separately.
These steps can be time consuming, but they will likely identify the root cause of the problem.

See Also:
Windows Crash Dump Analysis
Identifying Cooling Issues 
Stress Tesing a CPU to Detect Hardware Failure
How to Detect a Failing Hard Drive
Stress Testing a Video Card

Have an idea for something that you'd like to see explored? Leave a comment or send an e-mail to razorbackx_at_gmail com

No comments:

Post a Comment