1. Field of the Invention
The invention relates to high-performance memory subsystems including, for example, DDR SDRAM memory components. More specifically, the present invention relates to methods and associated structure for synchronizing the process of reading data between high-performance memory components and the associated memory controller device.
2. Discussion of Related Art
A number of present-day computing systems and other present-day applications utilize high-performance memory subsystems to store and retrieve data. For example, a high-performance computing system stores its programmed instructions and associated data in a high-performance memory subsystem for rapid fetching and execution of the associated program. Numerous memory architectures are known to provide the requisite high levels of performance. Generally, a system stores data in a memory subsystem by issuing write commands from the memory controller to the memory components and retrieves the stored data by issuing read commands from the memory controller to the memory components. Most such high-performance memory subsystems include features to read (or write) sequential locations in the memory components in response to a single read (or write) command. In other words, the memory components themselves return sequential locations after being directed to the first location associated with the read command. In high-performance memory subsystems the memory components may receive a clock signal from the memory controller and the memory components themselves provide a strobe signal used to indicate when valid data is available on the associated data bus as the various sequential locations of a burst read command are made available from the memory component.
In high-performance memory subsystem architectures, the data, clock and strobe signals between the memory components and the memory controller may be affected in a significant manner by propagation delays induced by design, layout, fabrication and environmental aspects of the overall system design. For example, lengthy conductive signal paths within a system design may impose significant propagation delays, ambient operating temperatures associated with the operational memory subsystem may affect timing of clock and strobe signals, and other well-known factors may impact timing relationships among these various signals critical to operation of the high-performance memory subsystem. Propagation delays generated by such environmental factors and design factors may be so severe as to dramatically change the phase relationship between the data, clock and strobe signals generated by the memory components and memory controller. Such delays may be so severe as to cause many of the signals to become meta-stable with respect to the memory controller and memory components interface timing specifications. In addition to problems of meta-stability, such timing problems may result in data loss (i.e. loss of data when an improper phase relationship causes more that one data to occur in a single sample interval). These timing problems are exacerbated by burst memory operations where the cycling of the signals is faster than in shorter single read or write command operations or other command processing. These timing issues are still further exacerbated by the still faster timing of double data rate (DDR) memory components (such as DDR SDRAMs) wherein data is returned on both the leading edge and the trailing edge of each strobe signal pulse.
One common solution to this design problem as presently known in the art is to provide an asynchronous FIFO such that the memory components control the write logic of the asynchronous FIFO (to fill the FIFO with data on read operations) while they memory controller manages operation of the read portion of the asynchronous FIFO (to retrieve read data returned in response to a read command). The asynchronous nature of such a FIFO isolates and separates the two clocking functions, namely: clocking relationships generated by the memory components that operate the write logic of the FIFO and the clocking relationships generated by the memory controller to read data from the FIFO.
Problems arise from use of such a FIFO in that performance of the memory subsystem may be degraded due to additional complexities and associated latencies entailed in moving read data through the asynchronous FIFO. For example, the read portion of the asynchronous FIFO managed by the memory controller must await information signals from the FIFO indicating that the FIFO is empty or not empty before attempting to read data transferred from the memory devices through the asynchronous FIFO. Generation of these signals within the FIFO control logic as well as the logic required to store data in and retrieve data from the FIFO all add delay to the return of requested read data. These additional latencies involved in reading data from a memory subsystem can have significant impact on overall system performance.
Further, use of such an asynchronous FIFO to obviate complexities of clock, data and strobe synchronization adds significant complexity to the overall circuit design. Such an asynchronous FIFO and related glue logic requires a significant number of gates.
It is evident from the above discussion that a need exists for an improved method and structure for synchronization of clocks and strobes in the return of read data from a high-performance memory subsystem.
The present invention solves the above and other problems, thereby advancing the state of useful arts, by providing methods and associated structure for using predetermined phase calibration information associated with the memory component data, clock and strobe signals to adjust and re-align the return of read data from the memory components. More specifically, returned read data is captured (registered) using a delayed version of the memory controller""s clock signal that is delayed to re-align with the strobe signal generated by the memory component. The delay is programmed in accordance with a predetermined delay determined from the circuit design. The predetermined delay period may be determined by hand calculation or by empirical static or dynamic measurements of the operating system. The steps to acquire the predetermined delay period are beyond the scope of the present invention. Rather, the present invention relates to use of such a predetermined delay value to adapt and re-align the registering of the returned read data.
The synchronization and realignment feature of the present invention obviates the need for a FIFO component to achieve desired phase matching between the data as clocked out by the memory component and the corresponding data as clocked in by the memory controller device. Eliminating the need for such an asynchronous FIFO reduces the added latencies generated by use of such a FIFO and reduces the gate count in the memory controller circuits because the logic and associated with the realignment feature of the present invention requires fewer gates and flip-flops than does an asynchronous FIFO as is commonly practiced in the art.
A first aspect of the invention provides a circuit for realigning read data returned to a memory controller from an associated memory component, the circuit including: a clock signal path on which a clock signal generated by the memory controller is applied for sampling the read data returned from the memory component wherein the clock signal has a predetermined desired phase relationship with a strobe signal generated by the memory component; a delay line coupled to the clock signal path to generate a delayed clock signal wherein the delayed clock signal is delayed to compensate for a predetermined phase offset from the desired phase relationship between the clock signal and the strobe signal.
In another aspect of the invention the delay line is a programmable delay line.
In still another aspect of the invention, the invention further provides for a first register clocked by the delayed clock signal and having an input adapted to receive the sampled data for registering the sampled read data in a first clock domain; and a second register clocked by the clock signal and having an input coupled to an output of the first register for reregistering the sampled data in a second clock domain.
Another aspect of the invention further provides for an inverter coupled to the clock signal path for generating an inverted clock signal; a third register clocked by the inverted clock signal and having an input coupled to the output of the first register for reregistering the sampled data in a third clock domain, wherein the second register is adapted to selectively receive on its input the output of the third register or the output of the first register.
Still another aspect of the invention provides for a comparator for determining if the delayed clock signal is sufficiently delayed from the clock signal to permit application of the output of the first register to the input of the second register without violating timing requirements of the second register; and a multiplexor having a selection input coupled to the output of the comparator and having the output of the first register coupled to a first input and having the output of the third register coupled to a second input to selectively apply the output of the third register to the input of the second register or the output of the first register to the input of the second register.
Yet another aspect of the invention further provides for an AND gate having its output coupled the input of the third register and having the output of the first register coupled to a first input and having the output of the comparator coupled to a second input, wherein the AND gate prevents metastability of the third register by gating the input to the third register.