The present invention relates generally to computer memory, and more specifically, to replay suspension in a memory system.
Contemporary high performance computing main memory systems are generally composed of one or more memory devices, which are connected to one or more memory controllers and/or processors via one or more memory interface elements such as buffers, hubs, bus-to-bus converters, etc. The memory devices are generally located on a memory subsystem such as a memory card or memory module and are often connected via a pluggable interconnection system (e.g., one or more connectors) to a system board (e.g., a PC motherboard).
Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the performance of the main memory devices(s) and any associated memory interface elements, and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability systems present further challenges as related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems in regard to mean-time-between-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact (such as space, power and cooling). In addition, customers are requiring the ability to access an increasing number of higher density memory devices (e.g., DDR3 and DDR4 SDRAMs) at faster and faster access speeds.
Memory controllers in systems with high reliability requirements generally interface to data buffers with automatic replay mechanisms which resend the buffered data in the event of a data integrity error. The replay mechanism is important when a memory data channel has a predicted or known failure rate which could impact the system reliability. Such a system may have multiple channels and stripe data block accesses across these channels to further improve system reliability. Memory stores to a block of data are coordinated across all channels such that a store command with striped data is effectively delivered to all of the channels simultaneously. Redundant array of independent memory (RAIM) systems are examples of such striped data systems. RAIM distributes data across several independent memory modules, where each memory module contains one or more memory devices. Examples of RAIM systems may be found, for instance, in U.S. Patent Publication Number 2011/0320918 titled “RAIM System Using Decoding of Virtual ECC”, filed on Jun. 24, 2010, the contents of which are hereby incorporated by reference in its entirety, and in U.S. Patent Publication Number 2011/0320914 titled “Error Correction and Detection in a Redundant Memory System”, filed on Jun. 24, 2010, the contents of which are hereby incorporated by reference in its entirety.
An automatic replay mechanism does not allow a data source to influence the replay; therefore, the data source timing is completely dependent upon the automatic replay. This can create problems in the event of a replay on a single channel of a multi-channel system. More specifically, the replay mechanism timing can interrupt a block data transfer. The healthy channels must continue the block data transfer, but the failing channel cannot. Therefore, the store completes on the healthy channels but not on the failing channel, which would require the memory controller to remember and reissue the interrupted store on the failing channel once the replay completes.