1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system that provides a combined error correction code and cyclic redundancy check code for a memory channel.
2. Description of Related Art
Contemporary high performance computing main memory systems are generally composed of one or more dynamic random access memory (DRAM) devices, which are connected to one or more processors via one or more memory control elements. Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the main memory device(s), and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability systems, i.e. systems that must be available to users without failure for large periods of time, present further challenges related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems with regard to mean-time-before-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact, such as space, power, and cooling.
Furthermore, with the movement to multi-core and multi-threaded processor designs, new requirements are being made for the memory subsystem to supply very large data bandwidths and memory capacity into a single processor memory module socket. At a system level, the bandwidth available from the memory subsystem is directly proportional to the number of memory channels that can be supported by the processor pin counts. However, in known memory subsystem designs, the memory channels have an inherent fail rate due to a number of failure mechanisms including but not limited to, contact fails between the pins of the memory module and the pins of the memory module socket, electrical noise on interface lines, driver/receiver failures, etc. The standard solution to resolve a memory access request failure due to one of these failure mechanisms is to have a cyclic redundancy check (CRC) code protect the data and have the memory controller retry the memory access request. Using CRC code protection works on transient failures but does not correct hard contact failure. Using CRC code protection also has a problem in that CRC code protection requires a significant amount of time to recover from an error on the memory channel and reissue all the memory access operations.