This invention relates generally to digital systems. More particularly, this invention relates to improving clock signal phase skew performance within a master-slave digital system that has multiple latent clock cycles.
FIG. 1 is a simplified illustration of a prior art synchronous bus system that reduces clock skew. The system is described in U.S. Pat. No. 5,432,823, which is assigned to the assignee of the present invention, and is incorporated by reference herein.
The system 20 of FIG. 1 includes a clock generator 22, a master device 24, and a set of slave devices 26_A through 26_N. A transmission channel is comprised of three components: a clock-to-master path 28, a turn-around path 29, and a clock-from-master path 30. The transmission channel ends at a termination block 31, which may be implemented with a resistor. Each clock pulse from the clock generator 22 traverses from the clock-to-master path 28, through the turn-around path 29, through the clock-from-master path 30, and into the termination block 31. The turn-around path 29 may be implemented as a wire linking a package pin connected to the clock-to-master path 28 and a package pin connected to the clock-from-master path 30. The phantom lines 33 illustrate that the turn-around can be implemented as a single package pin to which the clock-to-master path 28 and the clock-from-master path 30 are connected, provided that the created stub is relatively short.
The clock generator 22 is any standard clock source. The master device 24 is a device that can communicate with other master devices and with slave devices, and is located near the turn-around path 29. By way of example, the master device 24 may be a microprocessor, a memory controller, or a peripheral controller.
The slave devices 26 can only communicate with master devices and may be located anywhere along the transmission channel. The slave devices 26 may be implemented with high speed memories, bus transceivers, peripheral devices, or input/output ports.
In the system of FIG. 1, a data/control bus 36 (sometimes referred to simply as a data bus 36) is used to transport data and control signals between the master device 24 and the slave devices 26_A through 26_N. This operation is timed by the clock signals on the transmission channel (28, 29, 30). More particularly, the master device 24 initiates an exchange of data by broadcasting an access request packet on the data bus 36. Each slave device 26 decodes the access request packet and determines whether it is the selected slave device and the type of access requested. The selected device then responds appropriately, either reading or writing a packet of data in a pipelined fashion.
In the system of FIG. 1, the master device 24 transmits data on the bus 36 contemporaneously with clock signals on the clock-from-master path 30. In other words, the transmission of data from the master device 24 to the slave devices 26 is timed by the clock signals on the clock-from master path 30. Conversely, each slave device transmits data contemporaneously with the clock signal on the clock-to-master path 28. That is, the transmission of data from the slave devices 26 to the master device 24 is timed by the clock signals on the clock-to-master path 28. The scheme of having clock and data signals travel in the same direction is used to reduce clock data skew.
In many master-slave systems, the time of flight of a clock signal along the transmission channel is a multiple of a single cycle of the clock signal. In other words, it takes more than one clock cycle for a clock pulse to traverse the length of the transmission channel (28, 29, 30). As clock speeds increase and the number of slave devices increases, the number of clock cycles required for a clock pulse to traverse the length of the transmission channel increases.
Each clock cycle that transpires during the time of flight of a clock pulse on the transmission channel is referred to as a latent clock cycle. Thus, for example, if it takes five clock cycles for a clock pulse to traverse the length of a transmission channel, then the system has five latent clock cycles. Since data is transmitted with each clock cycle, data is in a sense xe2x80x9cstoredxe2x80x9d on the data bus 36 with each latent clock cycle. In other words, a packet of data may be launched with each latent clock cycle, resulting in several packets of data on the data bus 36. Since the system must keep track of this information on the data bus, considerable complexity is introduced into master-slave systems that have latent clock cycles. It would be highly desirable to reduce the complexity in such systems.
Latent clock cycles are more fully appreciated with reference to FIGS. 2A and 2B. FIG. 2A illustrates increasing cycle degrees on the vertical or y-axis and increasing distance from the clock source on the horizontal or x-axis. As the distance from the clock source increases, degrees of phase skew increase until a full 360 degree phase skew or latent cycle exists, as shown with waveform 38A, and subsequent waveforms 38B-38E.
FIG. 2B maps the components of FIG. 1 onto the plot of FIG. 2A. Thus, FIG. 2B illustrates the position of the master device 24 at the intersection of the axes. The first slave device 26A and second slave device 26B correspond to the second domain, represented by waveform 38B. The slave device 26N corresponds to the fifth domain, represented by waveform 38E. In sum, the slave devices fall into four different domains (38B, 38C, 38D, and 38E).
The problem of latent clock cycles is more fully appreciated in connection with a specific example. FIG. 1 illustrates a slave-to-master path 32 positioned between the master device 24 and the first slave device 26_A. The figure also illustrates a master-to-slave path 34 positioned between the master device 24 and the first slave device 26_A. System designers frequently place a set of slave devices 26 relatively far away from the master device 24; this distance can introduce considerable clock skew. Even if the slave devices 26 are positioned relatively close to the master device 24, the distance will introduce some clock skew. This phenomenon can be appreciated with reference to FIG. 3. Waveform 40 represents a clock pulse received at the master device 24 from the clock-to-master path 28. Waveform 42 represents the same clock pulse as applied by the master device 24 to the clock-from-master path 30. Observe that the two signals are in phase because only the length of the turn-around path 29 separates the two signals.
Waveforms 44 and 46 illustrate the same clock pulses when they are received at slave device 26_A (Slave_A). In other words, waveform 44 illustrates the clock pulse when it was received from the clock generator 22 at slave device 26_A, while waveform 46 illustrates the same clock pulse when it was received from the master device 24 at slave device 26_A (after traversing the length of the slave-to-master path 32, the turn-around path 29, and the master-to-slave path 34). As shown with arrow 48, the two clock pulses are considerably out of phase. This phase skew is attributable to a relatively long slave-to-master path 32 and master-to-slave path 34.
Arrow 54 illustrates the phase skew between clock signals 50 and 52 at slave device 26_B (Slave_B). At slave device 26_B there is a phase skew plus an additional latent cycle. Observe that at Slave_A the phase skew is larger than at Slave_B. The reason for this is that at Slave_B a latent cycle (a complete 360xc2x0 phase shift) has been introduced along with additional skew.
Waveforms 56 and 58 illustrate the clock pulses when they reach the last slave device 26_N (Slave_N). Arrow 60 illustrates the phase skew between the two signals. At this point, there may be several latent cycles between the two clock pulses plus a phase skew.
As previously indicated, latent cycles introduce complexity into a master-slave system and result in lower performance. Phase skew is problematic because communication between a slave device and a master device relies upon synchronized clock signals (i.e., synchronization between the clock signal on the clock-to-master path 28 and the clock signal on the clock-from-master path 30).
The phase skew and latent cycles attributable to the length of the transmission channel through the slave devices is uncontrollable. The phase skew introduced by the distance of the slave-to-master path 32 and the master-to-slave path 34 can be substantially reduced by shortening the distance of the slave-to-master path 32 and the master-to-slave path 34. Unfortunately, this option is usually unavailable because system designers want the flexibility to have an arbitrary distance between the master and slave devices. When the distance between the master and slave devices is large, substantial phase skew or one or more latent clock cycles are introduced into the system before processing at the first slave device commences.
As previously indicated, it is always desirable to reduce master-slave system complexity by reducing the number of latent cycles. This goal is especially important as clock speeds increase and slave device numbers increase. It is also important to allow flexibility in the length of the clock-to-master path and the clock-from-master path. It would be highly desirable to provide a system in which additional flexibility was afforded in the length of the clock-to-master path and clock-from-master path, while eliminating latent cycles and phase skew introduced by the clock-to-master path and the clock-from-master path.
The apparatus of the invention is a digital system with a master device, a set of slave devices, and a clock generator to generate a clock signal. A transmission channel includes a clock-to-master path extending from the clock generator, through the set of slave devices, to the master device. The transmission channel also includes a clock-from-master path extending from the master device and through the set of slave devices. The transmission channel also includes a slave-to-master path positioned between a first slave device of the set of slave devices and the master device. A master-to-slave path is positioned between the master device and the first slave device. The cumulative length of the slave-to-master path and the master-to-slave path creates a master routing phase shift between a clock signal on the clock-to-master path and a clock signal on the clock-from-master path. A first lead samples the clock signal on the slave-to-master path. A second lead samples the clock signal on the master-to-slave path. A clock latency adjustment circuit is connected to the first lead and the second lead to produce an adjusted clock signal on the master-to-slave path to compensate for the master routing phase shift.
The method of the invention includes the step of receiving a clock-to-master clock signal at a slave device. A clock-from-master clock signal is collected at the slave device; the clock-from-master signal is out of phase with the clock-to-master clock signal. The clock-to-master clock signal and the clock-from-master clock signal are processed to produce an adjusted clock-from-master clock signal at the slave device that is substantially in phase with the clock-to-master clock signal at the slave device.
The invention is distinguishable over the prior art in view of the number of clock drivers per channel. The invention uses two clock drivers for each channel: one driver supplies the Clock-To-Master (CTM) signal, while the second driver supplies the Clock-From-Master (CFM) signal. The location of the clock drivers varies according to the embodiment. For example, the clock drivers may both be located in the clock generator chip, or both be located on the master chip, or one could be in each of the above locations. Regardless of the position, the Clock-From-Master signal amplitude is improved compared to the single-driver scheme.
The second clock driver has an associated phase interpolator and phase comparator. The phase interpolator is controlled by the phase comparator, which adjusts phase such that there is zero phase angle at its inputs. The inputs to the phase comparator vary with different embodiments. For example, they may be connected, such as to measure phase angle (CTM vs. CFM) at a point inside the master package, or at some desired location along the channel. In this second case, the resulting system is quite different from the prior art in several regards. First, by defining the zero phase locus at any arbitrary point along the channel, it becomes unlikely that there is zero phase between CTM and CFM at the master. This will require additional complexity in the master. The benefit is that the number of different clock domains spanned by the slaves can be minimized, resulting in fewer delay stages in each slave.
One prior art technique for dealing with multiple clock domains is to xe2x80x9clevelizexe2x80x9d the channel by adding pipeline delay stages, such that data reaches the master at the same time from different slaves. The invention obviates the need for these prior art delay stages.