The present invention relates generally to computer memory, and more specifically, to out-of-synchronization detection and out-of-order detection in a memory system.
Contemporary high performance computing main memory systems are generally composed of one or more memory devices, which are connected to one or more memory controllers and/or processors via one or more memory interface elements such as buffers, hubs, bus-to-bus converters, etc. The memory devices are generally located on a memory subsystem such as a memory card or memory module and are often connected via a pluggable interconnection system (e.g., one or more connectors) to a system board (e.g., a PC motherboard).
Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the performance of the main memory devices(s) and any associated memory interface elements, and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability systems present further challenges as related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems in regard to mean-time-between-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact (such as space, power and cooling). In addition, customers are requiring the ability to access an increasing number of higher density memory devices (e.g., DDR3 and DDR4 SDRAMs) at faster and faster access speeds.
In a high-availability memory subsystem, a memory controller typically controls multiple memory channels, where each memory channel has one or more dual in-line memory modules (DIMMs) that include dynamic random access memory (DRAM) devices and in some instances a memory buffer chip. The memory buffer chip typically acts as a slave device to the memory controller, reacting to commands provided by the memory controller. The memory subsystem can be configured as a redundant array of independent memory (RAIM) system to support recovery from failures of either DRAM devices or an entire channel. In RAIM, data blocks are striped across the channels along with check bit symbols and redundancy information. Examples of RAIM systems may be found, for instance, in U.S. Patent Publication Number 2011/0320918 titled “RAIM System Using Decoding of Virtual ECC”, filed on Jun. 24, 2010, the contents of which are hereby incorporated by reference in its entirety, and in U.S. Patent Publication Number 2011/0320914 titled “Error Correction and Detection in a Redundant Memory System”, filed on Jun. 24, 2010, the contents of which are hereby incorporated by reference in its entirety.
In one example of a RAIM system, a memory controller interfaces with multiple channels, where each channel includes at least one buffered DIMM with an option of cascading one or more additional buffered DIMMs per channel. In the RAIM system, data from all channels must be perfectly aligned to a same upstream cycle at the memory controller. Since wire latencies to each of the channels and cascades could be drastically different, all round-trip controls and data are tightly aligned. The memory buffer chips perform a learning process to establish ‘deskewing’ of all the data and controls in order to obtain lock-step, synchronous behavior. However, deskewing logic for a pipelined memory operation can be expensive in area as well as power. In another example of a RAIM system, data tagging outside the memory buffer also solves the deskewing problem while keeping staging latches (area and power) minimized. In these synchronous memory subsystems that have a greater tolerance for synchronization alignment between channels, it is possible to support out-of-order processing and a degree of synchronization variation; however, performance can be reduced if the memory subsystem is allowed to deviate too far out of order or out of synchronization over a period of time.