1. Field of the Invention
The present invention generally relates to clock phase estimation and more specifically to a graphics dual data rate (GDDR) interface with a hardware write clock to clock (WCK2CK) training engine using meta-error detection code (EDC) sweeping and adjustably accurate voting algorithm for clock phase detection.
2. Description of the Related Art
Computational systems typically include one or more processing units, and one or more discrete memory devices coupled to the one or more processing units via a memory interface bus. Each of the one or more processing units may be a conventional central processing unit (CPU), or another type of processing unit, such as a graphics processing unit (GPU). Each discrete memory device is commonly a dynamic random access memory (DRAM) component that is configured to operate according to technical requirements of the memory interface bus. The technical requirements for a given memory interface bus are conventionally established as an industry-wide standard. Each of the one or more processing units reads data from and writes data to the DRAM component via the memory interface bus. As processing speed requirements increase for various types of processing units, operating speeds for associated memory interface buses also increase. Current operating speeds pose significant challenges for system designers because manufacturing variation in associated circuitry commonly result in signal-to-signal skews that are significant compared to data bit durations on current memory interface buses. To compensate for signal-to-signal skew, current memory interface buses typically execute one or more training procedures, where signal skews are measured and compensated, thereby phase-aligning the signals for near optimal operation.
In addition to increasing speed requirements, power management is also an increasingly important requirement. To minimize system power, the memory interface bus should be able to operate slowly when processing requirements decrease, and faster when processing requirements increase. Each time the memory interface bus changes speed, certain training operations need to be repeated. In a graphics application, where real-time performance is required, any re-training must not impact whether the memory interface bus can meet all real-time data access requirements, such as screen refresh. Failure to meet all real-time requirements may cause flicker and significantly degrading video quality and user experience.
One standard memory interface bus that includes clock training for signal phase-alignment and variable interface clock speed for power management features is referred to in the art as graphics dual data rate version five (GDDR5). GDDR5 is defined by the well-known industry standards group JEDEC. DRAM components that adhere to GDDR5 are commonly incorporated into modern graphics systems requiring high performance as well as power management. GDDR5 defines a thirty-two-bit wide memory interface bus, and includes one error detection code (EDC) bit per byte. In a clock training mode, the EDC bits are used as feedback from the DRAM component to a DRAM controller that is coupled to an associated processing unit that needs to access the DRAM component. The DRAM controller “sweeps” a write clock (used for transmitting data) versus a reference clock (used for transmitting commands and addresses) to find an optimal phase relationship between the two. The EDC feedback indicates whether the DRAM component is receiving the write clock early or late with respect to the reference clock for the current phase step in the sweep. The DRAM controller then establishes write clock timing for the DRAM component at the transition between early and late.
An additional feature of GDDR5 is the ability to split the thirty-two-bit data bus into two sixteen-bit data busses, each attached to one of two discrete DRAM components, for operation in “x16 mode.” In x16 mode, total storage attached to the thirty-two-bit bus may be advantageously doubled with minimal loss of performance. However, each of the two different DRAM components attached to the memory interface bus may require a different phase relationship between their respective write clock and the reference clock. With only one set of clock pins defined in the GDDR5 specification, a two sweep training procedure is conventionally performed to find a phase relationship between the write clock and the reference clock that is acceptable to both DRAM components. The first sweep finds a first phase relationship for one of the DRAM components, and a second sweep finds a second phase relationship for the second DRAM component. A midpoint (average) between the two phase relationships is then used by both DRAM components as an acceptable compromise. While this two sweep technique allows two GDDR5 devices to operate in x16 mode, training time may become a significant burden in certain applications. Lengthy training times associated with GDDR5 x16 mode may impede the real time performance requirements of a graphics system. For example, screen flickers occur if GDDR5 training times prevent refreshing the screen contents as scheduled.
Accordingly, what is needed in the art is a technique that enables faster clock training in GDDR5 DRAM components than is currently specified in the art.