The present invention relates in general to data processing systems, and in particular, to the launch of data on a bus for writing to double data rate (DDR) synchronous random access memory.
A processor chip (which may also be referred to as a central processing unit, or (xe2x80x9cCPUxe2x80x9d)) interfaces to off-chip memory for storage of data and instructions. Modern data processing systems typically employ, as off-chip random access memory, synchronous random access memory in which memory transactions are synchronized to edges of a memory clock signal. The memory clock signal is provided by the CPU. A particular data value to be stored in memory is asserted on the memory interface by the CPU for only a brief instant, timed by signals derived from the CPU""s clock, which signals are also provided to the synchronous memory as the memory clock.
Conventionally, the values to be stored are transferred through latches to which the timing signals are applied. FIG. 1A illustrates a portion 100 of a CPU including prior art bus interface mechanisms. Processor clock (xe2x80x9cp-clockxe2x80x9d) signal 102 is generated via a p-clock generator 104 and distributed throughout the processor chip through one or more p-clock distribution networks 106. (The p-clock is sometimes referred to as the xe2x80x9cGCLKxe2x80x9d.) P-clock 102 also provides a reference signal to a phase-locked loop (PLL) 108 that controls a voltage controlled oscillator (VCO) 110 that together generate memory clock 112 which is thereby derived from, and phase-locked to, p-clock 102. Memory clock 112 is distributed through one or more memory clock distribution nets 113.
Data transfers to memory are launched from, or transfers from memory are received at, diverse locations on the CPU chip, not merely in one central location. If data transfers were centralized, numerous problems would arise. These include noise from a concentration of near-simultaneous switching events, wiring congestion and path length disparities for both the data and clock paths because some locations in the chip would be relatively more remote than others from the central data transfer location. Consequently, data transfers are decentralized, and data is distributed from its source via one or more data distribution nets 114, and stored in latch pairs, or registers, 116. Data is generated, and stored, in the processor clock domain.
Data to be stored in memory is distributed to the CPU chip boundary via data distribution nets 114, and launched onto memory bus 118. Data is launched in response to memory clock 112 via a plurality of latches 120 which incorporate a master-slave latch pair, denoted latch L1 and latch L2 having an input internally coupled to an output of the L1 latch.
Although distributing data transfer locations on the CPU chip does mitigate the aforementioned problems, data signals are typically substantially skewed relative to the timing signals, for example memory clock 112, at the data transfer locations on the chip boundary. Furthermore, the amount of skew may vary due to the variation in path lengths for the data and timing signals, which variation may be substantial. This is illustrated in the timing diagram in FIG. 1B. In the embodiment illustrated in FIG. 1B, data 122 input to latch 120 is latched on a rising edge of memory clock 112. Portions xe2x80x9cAxe2x80x9d, xe2x80x9cBxe2x80x9d, and xe2x80x9cCxe2x80x9d are launched at edges t1, t2, and t3, respectively. Due to the skew, Ts, in the arrival times of data 122 and the corresponding edge of memory clock 112, a center of the data valid interval for data 122 is shifted relative to the edges of memory clock 112. As a consequence, data 122 has excessive setup time, Tsu, and short hold time, Th. If the hold time becomes too short, shorter than the hold time specified by the manufacturer of the synchronous memory, the memory write may result in erroneous data being stored in memory.
Conventionally, the skew problem has been addressed by tuning of the electrical characteristics associated with the conduction paths to adjust effective path lengths. In this way, the skew of the data and timing signals at the data transfer points on the CPU boundary are controlled. However, advances in CPU technology make this conventional approach increasingly problematic. Higher frequency operation, smaller conductor cross-sections, smaller separation between conductors, and longer conduction paths all exacerbate the limiting of the signal skew using conventional approaches. Moreover, as CPU speeds increase, bus clocks speeds become more important in determining the overall performance of the data processing system. Thus, bus clock speeds must increase in order to keep pace with the increase in CPU performance. This trend in bus clock speeds further increases the constraints on data and timing signal skew. Thus, there is a need in the art for apparatus and methods that mitigate the skew in the data and timing signals in data transfers to memory in data processing systems, as well as mitigating sensitivities to sources of skew arising from manufacturing processes and CPU operation.
The aforementioned needs are addressed by the present invention. Accordingly, there is provided, in a first form, a bus interface apparatus. The apparatus includes circuitry operable for receiving a first data stream for outputting on a bus and generating second and third data streams in response to the first data stream. Also included is selection circuitry operable for alternatively selecting from the second and third data streams a sequence of data values for outputting on the bus, wherein the selection circuitry selects for outputting in response to a select signal, wherein the select signal is generated in response to a first bus clock, and circuitry for outputting a second bus clock to the bus in response to the first bus clock, a data valid interval of each value of the sequence of data values having a skew with respect to the second bus clock determined by the circuitry operable for generating the second and third data streams.
There is also provided, in a second form, a method of launching data on a bus. The method includes the steps of receiving a first data stream operable for launching on the data bus, and generating second and third data streams in response to the first data stream. The method further constitutes receiving a first bus clock and generating a select signal in response thereto, and alternately selecting from the second and third data streams a sequence of data values for launching on the bus in response to the select signal.
Additionally, there is provided, in a third form, a data processing system. The system contains a memory coupled to a memory bus, and a central processing unit (CPU), the CPU including a bus interface coupled to the memory bus. The bus interface includes circuitry operable for receiving a first data stream for outputting on the memory bus and generating second and third data streams in response to the first data stream. Also contained in the bus interface is selection circuitry operable for alternatively selecting from the second and third data streams a sequence of data values for outputting on the bus, wherein the selection circuitry selects for outputting in response to a select signal, wherein the select signal is generated in response to a first bus clock. Circuitry within the bus interface is included for outputting a second bus clock to the memory bus in response to the first bus clock, a data valid interval of each value of the sequence of data values having a skew with respect to the second bus clock determined by the circuitry operable for generating the second and third data streams.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.