1. Technical Field
The present application relates generally to an improved data processing system. More specifically, the present application is directed to using cache that is embedded in a memory hub to replace failed memory cells in a memory subsystem.
2. Description of Related Art
Contemporary high performance computing main memory systems are generally composed of one or more dynamic random access memory (DRAM) devices, which are connected to one or more processors via one or more memory control elements. Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the main memory device(s), and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability computer systems present further challenges as related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems in regard to mean-time-before-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact, such as space, power, and cooling.
Thus, computer system designs are intended to run for extremely long periods of time without failing or needing to be powered down to replace faulty components. However, over time memory cells in DRAM chips or other memory subsystems can fail and potentially cause errors when accessed. These individual bad memory cells can result in large blocks of memory being taken out of the memory maps for the memory system. Further, the loss of the memory can lead to performance issues in the computer system and result in a computer system repair action to replace faulty components.