The present invention relates generally to computer systems and more specifically to the analysis of memory errors.
Particularly in high reliability computer systems, a significant proportion of the system downtime that can occur is as a result of memory errors and the interruptions in operation for memory hardware replacement. A frustration for manufacturers and operators is that up to 80% of the memory modules (e.g., dual in line memory (DIMM) modules) that are returned to memory vendors are diagnosed as “no trouble found” (NTF), indicating that good components have been replaced.
A reason for this is that in the absence of an accurate assessment of which of a number of potentially faulty DIMMs is indeed faulty, an engineer will replace all potentially faulty modules.
Accordingly, there is a need to improve the diagnosis of memory faults to further improve service reliability and to reduce the number of good units that are replaced.