1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a method for enhancing the memory bandwidth available through a memory module of a memory system.
2. Description of Related Art
Contemporary high performance computing main memory systems are generally composed of one or more dynamic random access memory (DRAM) devices, which are connected to one or more processors via one or more memory control elements. Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the main memory device(s), and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability systems present further challenges as related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems in regard to mean-time-before-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact, such as space, power, and cooling.
Furthermore, with the movement to multi-core and multi-threaded processor designs, new requirements are being made for the memory subsystem to supply very large data bandwidths and memory capacity into a single processor socket. At a system level, the bandwidth and memory capacity available from the memory subsystem is directly proportional to the number of dual in-line memory modules (DIMMs) that are installed in the system and the number of independent memory channels connected to the DIMMs. Due to the large increases in the number of cores and threads in a processor socket, a system that at one time only required four or eight DIMMs on each processor socket now may require two to four times the number of independent DIMMs. This in turn would drive system packaging to larger and larger packages. In a dense computing environment where there may be hundreds of processor racks, increasing the package size for a system may not be a viable option.
A conventional fully buffered DIMM includes a memory hub device that interfaces between a memory controller of a processor and dynamic random access memory (DRAM) on the DIMM. This memory hub device includes a high-frequency, high-bandwidth bus structure or memory channel between the memory hub device and the processor. The memory hub device also includes a second high-frequency, high-bandwidth point-to-point interface to the next DIMM in a daisy-chain configuration and a lower-bandwidth multi-drop eight-byte interface to the DRAMs on the DIMM. The bandwidth capability of the memory channel that is feeding the DIMM is significantly larger than the bandwidth capability of the interface to the DRAMs on the DIMM creating a mismatch of bandwidths.
A mismatch of bandwidths normally results in loss of performance in the system. That is, even though the processor is able to send access requests to the memory hub device using the high-bandwidth memory channel, the memory hub device is limited in its access to the DRAMS by lower-bandwidth memory interface. The industry standard solution to this is to install another DIMM on the daisy-chain interface. With this configuration the bandwidth from two memory hub devices may be combined to more efficiently use the bandwidth of the channel to the memory controller. However, the link between the memory hub devices results in added latency on read operations, which results in lower system performance. Additionally, there are many system configurations that do not have the physical space for a second DIMM socket. Without the space for the second socket there is no solution to efficiently use the bandwidth on the memory channel. In addition, for systems that target very dense computing environments, there may not be enough DIMM connectors for all the memory channels on the processor interface, let alone providing multiple DIMMs per memory channel.