Circuit designers of multi-Gigabit systems face a number of challenges as advances in technology mandate increased performance in high-speed components and systems. At a basic level, data transmission between components within a single semiconductor device, or between two devices on a printed circuit board, may be represented by the system 10 shown in FIG. 1A. (Accordingly, as used herein, a “device” can refer to a discrete device, such as a microprocessor or memory controller, or a component of a device, such as an integrated circuit or a functional block, for example). In FIG. 1A, data is transferred (e.g., forwarded, returned, transmitted, sent, and/or received) between a first device 12 and a second device 14 along (e.g., across, carried by, over, on, through and/or via) channels 16 (e.g., copper traces on a printed circuit board or “on-chip” in a semiconductor device). A standard interconnect approach is shown, in which each channel 16 carries a particular bit (D0, D1, etc.) in the parallel stream of data being transmitted. (This is sometimes known in the industry as a “single-ended” approach). Because either device 12 or 14 may act as the transmitter or receiver of data at any point in time, each channel 16 in each device contains both a transmitter (tx) and a receiver (rx), each operating in accordance with a clock signal, Clk. This clock signal, Clk, can comprise a forwarded clock which, as its name suggests, is forwarded on its own channel 16 from the transmitting device to the receiving device so as to synchronize with the transmitted data. Alternatively, the clock, if not transmitted on its own channel, may be derived at the receiving device via clock data recovery (CDR) techniques, which are well known in the art and well understood by those of skill in the art. A differential clock could also be used in which true clock and complement clock are sent over two channels, which can be useful to minimize clock jitter, as is well known.
A typical receiver circuit used in conjunction with the standard interconnect approach of FIG. 1A is shown in FIG. 1B. The receiver circuit comprises an amplifier stage 20 whose output is coupled to a latch 22, which as illustrated comprises cross-coupled NAND gates. The input to the amplifier stage 20 comprises the data as received (DataIn), which is compared to a reference voltage (Vref), which is typically a midpoint voltage of one-half of the receiving device's power supply (i.e., ½ Vdd). When enabled by the clock, Clk, the amplifier stage resolves and amplifies the difference between the received data, DataIn, and the reference voltage, Vref, as is well known.
Another approach used to transmit data via a parallel bus is a differential interconnect approach, which is illustrated in FIG. 2A. In this approach, a given bit (D0, D1, etc.) is always transmitted along with its complement (D0#, D1#, etc.). As a result, a pair of channels 16 must be dedicated to each bit, one channel carrying true data, and the other, its complement. To accommodate this architecture, a transmitter circuit and receiver circuit are shared between each pair of channels, as shown. The receiver circuit used in the differential interconnect approach is shown in FIG. 2B, and is essentially the same as that illustrated in FIG. 1B, except that the complementary data state (DataIn#) is used in lieu of the reference voltage, Vref.
The differential interconnect approach of FIG. 2 has the effect of making data resolution more reliable when compared to the standard interconnect approach of FIG. 1. Such increased reliability results from at least three effects. First, because the receiver circuitry (FIG. 2B) uses complementary inputs, the voltage margin of the amplifier stage 20 is increased, which leads to faster, more reliable resolution of the data state by the receiver circuitry. Second, because true data is always transmitted along with its complementary data, cross talk-by which one channel perturbs data on an adjacent channel 16 in the bus-is minimized. Third, a non-differential signal is more susceptible to simultaneous switching output (SSO) noise generated at both the transmitters and receivers. Furthermore, in addition to the increased SSO rejection capability of differential interconnects, the very nature of the typical differential driver minimizes the generation of SSO.
However, increased sensing reliability in the differential interconnect approach comes at an obvious price, namely the doubling of the number of channels 16 needed to complete the parallel bus. To offset this, and keep the number of channels 16 constant, the clock, Clk, used in the differential interconnect approach is generally faster than would be used in the standard interconnect approach. Indeed, if the clock used is twice as fast, it will be appreciated that the number of bits transmitted per channel 16, i.e., the data capacity, is equivalent between the two approaches. Fortunately, increased sensing capability in the differential interconnect approach allows for higher clock speed to be used effectively, and clock speed even higher than double speed could be used.
As well as providing for both standard and differential interconnect approaches, the prior art also provides for data to be received with “multiphase, fractional-rate receivers,” such as is shown in FIGS. 3A, 3B, and 4. FIG. 3A, for example, shows multiphase, fractional-rate receivers used in the standard interconnect approach. Suppose four sequential bits of data (e.g., Da, Db, Dc, Dd) are transmitted across a given channel (e.g., 163) on both the rising and falling edge of a clock, Clk, in what would be known as a Double Data Rate (DDR) application. Each of these four bits is captured at its own receiver (rx) by one of a plurality of phase-shifted, fractional-rate clocks. Because four bits are to be sensed in this example, four clocks of four distinct phases, Clk(a), Clk(b), Clk(c), and Clk(d) are used to sense data Da, Db, Dc, and Dd at each of the receivers.
As shown in FIG. 3B, the phase-shifted, fractional-rate clocks Clk(x) are typically generated from the master clock, Clk, using known techniques. Each generated phase-shifted, fractional-rate clock, Clk(x), is a fraction of the frequency of the master clock, e.g., a quarter-rate or half-rate clock. Data capture at the receivers can occur on both the rising and falling edges of each clock, or on either the rising edge or falling edges of each clock. For example, and assuming that four receivers are used to sample the four bits, either a quarter-rate clock which samples on rising and falling edges (18a) or a half-rate clock which samples on the rising edges only (18b) can be used. However, the number of fractional-rate receivers can be varied to the same effect. Thus, eight quarter-rate clocks combined with eight fractional-rate receivers could be used to sample the data only on rising edges (18c), or two half-rate clocks used with two fractional-rate receivers could be used to sample the data on rising and falling edges (18d).
Multiphase, fractional-rate clocks at the receiver are useful in situations where data can be transmitted at a rate faster than the receiver can resolve the data state. For example, when a quarter-rate clock is used, the receiver essentially has four times longer to properly resolve the data state, which is beneficial because it can take significant time for the amplifier stage 20 in the receiver (FIG. 1B, 2B) to amplify and resolve the data state. Fractional-rate clocks at the receiver can also be used in differential interconnect approaches, such as is illustrated in FIG. 4. As the operation of FIG. 4 should be apparent by extension from the foregoing explanations, it is not further discussed.
While any of the above approaches can be used in the transmission of data through a parallel bus, the use of any one approach may not be optimal, a point discussed further below. This disclosure presents a more optimal solution.