In digital data systems in general and in computer systems in particular, there is an ever-increasing drive for larger bandwidth and higher performance. These systems are comprised of discreet integrated circuit chips that are interconnected by a bus. Data moves through a chip and between chips in response to clock pulses, which are intended to maintain synchronization of the data in parallel paths. At the extremely high data rates in today's systems, variations in the propagation of data over a bus along one path as compared to another path on the bus (i.e. skew) can exceed one clock cycle. U.S. Pat. No. 6,334,163, which is assigned to the assignee of this application and is incorporated herein by reference, discloses a so called Elastic Interface (EI) that can compensate for bus skew greater than one clock cycle without a performance penalty. However, packaging technology has not been able scale up to match the performance and bandwidth of the chip and interface technologies. In order to reduce the number I/O terminals on a chip and the number of conductive paths in a bus between chips, the prior art transfers data at a so called Double Data Rate (DDR), in which data is launched onto the bus at both the rising and falling edges of the clock. This allows the same amount of data to be transferred (i.e. bandwidth) with only half the number of bus conductors and half the number of I/O ports, as compared with a system where data is transferred only on a rising or a falling edge.
In certain control paths where the control data word is wider than the physical double data rate buss, the ability to transmit only a portion of the control data on one edge of the clock may introduce a latency of a half cycle while waiting for the remainder of the control data, which is transferred on the next clock edge. For example, in a control/address path from a CPU to an L2 cache, if only the first shot of address information can be sent on the first one half bus cycle, the full address takes another one half cycle to get to the destination. This extra latency in prior art organization and use of data in systems using double data rate interfaces introduces a latency that could degrade overall performance.
FIG. 1 illustrates a typical prior art interface between a central processor chip CP and a system controller chip SC for a set associative cache. In this illustrative example of the prior art, the bus is 40 bits wide and has a data rate of x, with data transferred on to the bus on one edge of the CP driver clock signal. FIG. 2 illustrates a prior art interface with the same data transfer rate as the interface of FIG. 1, but operating at a double data rate, that is with data transferred on both edges of the chip clock. Although the overall data rate is the same as in FIG. 1, here the bus is only 20 bits wide, but the data rate is 2×.
FIGS. 3 and 6 illustrate the number of local clock cycles required for a set associative cache access using the single data rate bus of FIG. 1. In this comparative illustration, 5 local clock cycles are used. The first clock cycle is used to latch the entire address data in the interface register CO. The second clock cycle determines on-chip priority arbitration (assumes more than one potential requester for directory access). The third clock cycle stores the address data in the address register (C1) and accesses the cache directory with the congruent segment of the cache address. The fourth local clock stores the directory (Dir) output in register Dir C2 and the cache data address in register Pipe (C2) and compares the address in the Compare Hit step. The fifth local clock cycle stores a directory hit data in register Pipe C3.
FIGS. 4 and 7 illustrates the prior art steps using a double data rate bus of FIG. 2. Here 5½ cycles are required because the first 20 bits sent over the interface are stored and wait one half cycle until the second 20 bits are transmitted, losing one half cycle of latency. The steps here are essentially the same as those explained in connection with FIGS. 3 and 6, except that the first 20-bits of the address are stored in a staging register Stg for one half cycle waiting for receipt of the second 20-bits of the address. At the end of the half cycle, the first 20-bits are transferred to the register Interface CO where the second 20-bits are stored. From this point on the steps are as described in connection with FIGS. 3 and 6.