This invention relates generally to computer memory, and more particularly to systems and methods for providing remote pre-fetch buffers.
Contemporary high performance computing main memory systems are generally composed of one or more dynamic random access memory (DRAM) devices, which are connected to one or more processors via one or more memory control elements. Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the main memory device(s), and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability systems present further challenges as related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems in regard to mean-time-between-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact (such as space, power and cooling).
FIG. 1 relates to U.S. Pat. No. 5,513,135 to Dell et al., of common assignment herewith, and depicts an early synchronous memory module. The memory module depicted in FIG. 1 is a dual in-line memory module (DIMM). This module is composed of synchronous DRAMs 8, buffer devices 12, an optimized pinout, and an interconnect and capacitive decoupling method to facilitate high performance operation. The patent also describes the use of clock re-drive on the module, using such devices as phase-locked loops (PLLs).
FIG. 2 relates to U.S. Pat. No. 6,173,382 to Dell et al., of common assignment herewith, and depicts a computer system 10 which includes a synchronous memory module 20 that is directly (i.e. point-to-point) connected to a memory controller 14 via a bus 40, and which further includes logic circuitry 24 (such as an application specific integrated circuit, or “ASIC”) that buffers, registers or otherwise acts on the address, data and control information that is received from the memory controller 14. The memory module 20 can be programmed to operate in a plurality of selectable or programmable modes by way of an independent bus, such as an inter-integrated circuit (I2C) control bus 34, either as part of the memory initialization process or during normal operation. When utilized in applications requiring more than a single memory module connected directly to a memory controller, the patent notes that the resulting stubs can be minimized through the use of field-effect transistor (FET) switches to electrically disconnect modules from the bus.
Relative to U.S. Pat. Nos. 5,513,135, 6,173,382 further demonstrates the capability of integrating all of the defined functions (address, command, data, presence detect, etc) into a single device. The integration of functions is a common industry practice that is enabled by technology improvements and, in this case, enables additional module density and/or functionality.
FIG. 3, from U.S. Pat. No. 6,510,100 to Grundon et al., of common assignment herewith, depicts a simplified diagram and description of a memory system 10 that includes up to four registered DIMMs 40 on a traditional multi-drop stub bus. The subsystem includes a memory controller 20, an external clock buffer 30, registered DIMMs 40, an address bus 50, a control bus 60 and a data bus 70 with terminators 95 on the address bus 50 and the data bus 70. Although only a single memory channel is shown in FIG. 3, systems produced with these modules often included more than one discrete memory channel from the memory controller, with each of the memory channels operated singly (when a single channel was populated with modules) or in parallel (when two or more channels where populated with modules) to achieve the desired system functionality and/or performance.
FIG. 4, from U.S. Pat. No. 6,587,912 to Bonella et al., depicts a synchronous memory module 210 and system structure in which the repeater hubs 320 include local re-drive of the address, command and data to the local memory devices 301 and 302 via buses 321 and 322; generation of a local clock (as described in other figures and the patent text); and the re-driving of the appropriate memory interface signals to the next module or component in the system via bus 300.
Contemporary computing systems may employ hub chip based memory systems, where memory devices (typically DRAM) are connected to memory hub devices. The memory hub devices are interconnected to the system memory controller(s) via a network of communication channels. To facilitate high frequency signaling, the channels are generally comprised of one or more unidirectional outputs (downstream) and a unidirectional input (upstream) point-to-point links. Processing and/or input/output (I/O) requests (including address, read/write indication and/or any other attributes) to access the system memory (referred to as access requests) are serviced by the system memory controller(s). The memory controller translates and regulates the access requests. The memory controller schedules and prioritizes the requests to the available memory banks for optimal system performance. The requests are specifically encoded according to the channel protocol (or associated interface protocol) and transmitted to the selected memory hub devices via the downstream link(s). Write requests include data associated with the write request, but not necessarily sent in the same packet. Read requests imply an expected data reply that will be transferred back to the memory controller via the one or more subsequent upstream packets. The targeted hub device(s) translate received requests and responsively control the attached memory devices to store write data from the hub, or provide read data to the hub.
The memory controller data read access latency can be mitigated by having a pre-fetch buffer associated with the memory controller. A pre-fetch buffer generally consists of a small associative cache, with one or more logical entries for storing data, associated address information, and other attributes. The memory controller can autonomously speculatively read and/or responsively read data from the memory devices for the purpose of storing the data into the pre-fetch buffer for lowest latency to anticipated future requests.
Hub based memory systems that employ pre-fetch buffers within a hub device to mitigate the latency associated with directly accessing the memory devices have been described (see, for example, U.S. Publication Number US2004/0260909 to Lee, et al.). However, this scheme incurs significant latency associated with the memory controller communicating with the hub device(s) to reference pre-fetch data stored within the hub. Having a pre-fetch data cache in the memory controller mitigates the access latency associated with implementing such buffers in the remote hub devices. However, both of these techniques require the use of critical output and input channel resources to transport data from the memory devices to the pre-fetch buffers, often during periods of high contention for these resources. Therefore, a need exists for a more efficient hub-based memory system to improve computing performance through lower memory latency and higher memory throughput.