The present invention relates in general to input/output (I/O) interface architectures in integrated circuits, and in particular to methods and circuitry for distributing data strobe signals in a programmable logic device (PLD) that employs a multiple data rate memory interface.
Typical I/O architectures transmit a single data word on each positive clock edge and are limited to the speed of the clock signal. To address the problem of data bandwidth bottleneck between integrated circuits, high speed interface mechanisms have been developed to increase the speed of data transfer and data throughput. In a multiple data rate (MDR) interface scheme, two or more data words are transferred during each clock period. For example, in a double data rate (DDR) interface scheme, data is captured on both a rising edge and a falling edge of the clock to achieve twice the throughput of data. Multiple data rate technologies have thus accelerated the I/O performance of integrated circuits for a wide array of applications from computers to communication systems. For example, the MDR technologies are being employed in today's memory interfaces including interfaces for the double data rate synchronous dynamic random access memory (DDR SDRAM), fast cycle random access memory (FCRAM), reduced latency dynamic random access memory (DRAM I or RLDRAM I or RLDRAM II), and quadruple data rate static random access memory (QDR) as well as other high-speed interface standards.
Programmable logic devices (PLD) have been used to implement memory interface controllers for memory interfaces such as the DDR, QDR, or RLDRAM interfaces. The ability to modify the design on-the-fly to meet difficult memory interface timing requirements and the flexibility of programmable logic in customizing features of the memory interface controller are two of the primary advantages of using programmable logic in these applications.
An important feature of PLD's is package migration, which allows different members of a given PLD family to be interchanged on a given circuit board. This feature is useful for circuit designers because, as their design matures, they can choose any of a family of PLD's with different densities for a given socket on the circuit board. Another desirable feature of the PLD's, when used as memory controllers, is the ability to support memory devices having different speeds, different data bus widths, different data group sizes, and different timing requirements. On the other hand, high performance off-chip memories have such stringent timing requirements that it has been a challenge to design a PLD to interface with high performance off-chip memories while preserving the package migration feature and the ability to support memory devices having different speeds, data bus widths, and data group sizes. For example, the most recent high-performance off-chip memory standards, such as the RLDRAM II, have such stringent timing requirements that existing PLDs cannot interface with them.
Thus, as support for faster memory interfaces is adopted in PLD's, timing margins for meeting the memory interface requirements are becoming tighter, and it is becoming more important to reduce skew components. For example, in a basic DDR implementation, a clock signal (DQS) functions as a data strobe for controlling the timing of the transfer of I/O data (also referred to as DQ signals). During a read operation, each DQS signal comes to a PLD with a group of DQ signals. The DQS signal arrives at a DQS pin of the PLD and is phase-corrected before it is routed to a plurality of I/O registers for capturing the group of DQ signals. One of the skew components in the PLD is the difference in arrival time between DQS and DQ at the I/O registers. Also, the arrival time of the DQS signal relative to each of the group of DQ signals may also be different. To illustrate the problem, FIG. 1 shows a DQS bus 110 driven by a DQS bus driver 105, which produces at the driver a phase-corrected version of the DQS signal. The DQS bus 110 routes the phase-corrected DQS to the plurality of I/O registers 120 for capturing the group of DQ signals. Since the DQS bus 110 is implemented as a single metal track that stretches across all of the plurality of I/O registers 120 for capturing the group of DQ signals, it introduces skew along the bus because the I/O registers closer to the DQS bus driver 105 will receive the phase-corrected DQS signal sooner than the I/O registers farther away from the DQS bus driver 105. The skew becomes worse as the group of DQ signals (i.e. the number of I/O registers 120) gets larger, such as a 32-bit data group. Therefore, for improved timing accuracy, it is desirable to minimize the skew between DQS and DQ signals as much as possible.