As computers and computer processors increase in performance, memory access performance becomes a significant factor affecting overall system performance. If an interface that communicates data between a memory device and a memory controller or other application device operates more slowly than a processor can use data, the interface can reduce the data processing capacity of the entire computer. For dynamic random access memory (DRAM) devices, which are commonly used as the main working memory for a computer, various interconnect technologies have been developed over the years. One such interconnect technology is used for synchronous DRAMs, or SDRAMs, which utilize a source synchronous interface, where the source of data during a data transfer is relied upon to provide a data strobe signal (DQS) that is used by the target of the data transfer to capture such data as it is being transferred over a data line to the target. In particular, the capture of data on a data line is typically latched by the rising or falling edge of the DQS signals, for example, so that the value transmitted on a data line when the data strobe signal transitions from low to high, or vice-versa, will be latched into a data latch in the target.
DRAM memory elements, such as double data rate (DDR) memory elements, contain multiple buses. A command and address bus is formed by a number of signals, such as, for example, a column-address strobe (CAS), row-address strobe (RAS), write enable (WE), clock enable (CKE), chip-select (CS), address (ADDR), bank address (BA) signals, and differential clock signals (CK and CKN). The data bus contains the data signals (DQ), and the source synchronous strobes (DQS and DQSN). DDR3 memory elements operate with differential data strobes DQS and DQSN, which enable source-synchronous data capture at twice the clock frequency. Data is registered with the rising edge of DQS and DQSN signals. The data bus of some types of DDR memory elements also include datamask (DM) signals that are used for masking selected bits during a write operation.
DDR3 data is transferred in bursts for both read and write operations, sending or receiving a series of four (referred to as burst chop 4 or BC4) or eight (referred to as burst length 8 or BL8) data words with each memory access. For read operations, data bursts of various lengths are transmitted by the DRAM device edge-aligned with a strobe. For write operations, data bursts of various lengths are received by the DRAM element with a 90-degree phase-delayed data strobe signal. The strobe signal is a bidirectional signal used to capture data. After the data is captured in the source-synchronous strobe domain, the data must be transferred into a local clock domain.
For dual in-line memory modules or DIMMs, the DDR3 memory specification includes what is commonly known as a “flyby” topology for clock, address and control connections that are shared among all the DRAM devices on the DIMM. As opposed to the balanced tree arrangement used in DDR2 memory, which provides clock, address and control signals of approximately the same length to each memory element in a memory module at the expense of signal integrity, the flyby topology is arranged to promote signal integrity and results in clock, address and control connections of different lengths for each device within the module. Consequently, the timing relationships between the data, data strobe and clock signals can vary undesirably from one memory element in the DIMM to another. Since the DDR3 SDRAM devices require a specific timing relationship between the data, data strobe and clock at the respective DRAM pins, the DDR3 specification supports an independent timing calibration known as “write leveling” for each source synchronous group.
Write leveling allows the host logic circuitry that initiates the data transfers (e.g., a core logic portion of an ASIC) to configure the interface that communicates data between the memory controller and the target DRAM to delay the data strobe signal and data signals by a configurable or controllable amount of time. A free-running clock signal, CK, is propagated from the host logic circuitry to an input of a dedicated internal calibration register in a target DRAM. In the write leveling procedure, a single data strobe (DQS) signal pulse is propagated from the host logic circuitry to the dedicated internal calibration register in the DRAM. In response to this single DQS signal pulse, the dedicated internal calibration register outputs a signal that indicates the phase alignment between the clock signal and the data strobe signal rising edge at the DQ pin of the DRAM. This output is propagated back to the host on one or more of the data signal (DQ) lines to the memory controller. The calibration is repeated with different delay values until an optimum write-leveling delay is determined. The host logic circuitry can then configure the interface to use the optimized write-leveling delay for subsequent write operations from the memory controller to the target DRAM.
Although the above-described write leveling calibration or training procedure can be used to phase-align the data strobe signal with respect to the clock signal at the DRAM pins, write leveling does not affect phase alignment between the data strobe signal and the data signals at the DRAM pins. Another calibration system and procedure, sometimes referred to as “per-bit de-skew” or PBDS, has been developed to phase-align each data (bit) signal with the data strobe signal at the DRAM pins. That is, PBDS allows the host logic circuitry to configure the interface to delay a data signal on a selected data line within a source synchronous group by a configurable or controllable amount of time with respect to the data strobe signal. In the PBDS calibration or training procedure, the host logic circuitry performs a number of write operations to a target DRAM, setting a skew value for the selected data line to a different value on each write operation and then reading back the data that was written. A match between the data that was written and the data that was read back indicates that the skew value aligned the selected data bits with the strobe to an extent sufficient to meet DRAM setup and hold requirements. PBDS is generally used to meet the more stringent timing requirements of higher-speed (e.g., greater than 1.3 Mtps) interfaces where differences among data signals and the data strobe signal can be a significant percentage of the clock period.
Datamask is a feature of DDR3 DRAM that the host logic circuitry can use during a write operation to mask selected memory locations. To use the datamask feature, the host logic circuitry outputs a mask signal along with the data word to be written. A logic-0 value on the mask signal indicates that the DRAM is to write a data word to the corresponding word position in the DRAM memory. A logic-1 value on the mask signal indicates that the DRAM is not to write a data word to the corresponding word position in the DRAM memory, thereby preserving the value of that word in the DRAM. Like data signals, datamask signals are configurable using PBDS. That is, the host logic circuitry can configure the interface to delay a selected line of the datamask signal within a source synchronous group by a configurable or controllable amount of time with respect to the data strobe signal.