1. Field of the Invention
This invention is related generally to the field of microprocessors and more particularly to the use of clock skipping techniques to transfer data between different clock domains in a microprocessor.
2. Description of Related Art
In simple computer systems, a single clock signal may be used to run all of the devices which are integrated into the microprocessor chip. As shown in FIG. 1, a system PLL (phase locked loop) 11 may provide a clock signal to a microprocessor 12, a memory 13 and a peripheral device 14 via clock line 16. The signal is used to clock data transfers between the devices on bus 15.
While implementation of the system illustrated in FIG. 1 is simple and relatively straightforward, its simplicity results in some performance limitations. One of these limitations relates to the variations in the clock signal which is seen by the various devices on the chip. The use of a network of conductive traces to deliver the clock signal to each of the devices causes reflections, noise and other uncertainties in the signal. These factors cause differences in the signals delivered to different devices, which may in turn limit the devices"" ability to communicate data. For example, if there is a skew between the clock signals arriving at two devices, a value may have to be asserted by the transmitting device for a longer time than would otherwise be necessary in order to ensure that the value can be sampled by the receiving device.
In the simple system illustrated in FIG. 1, a data transfer involves two devices in the same clock domain. (xe2x80x9cClock domainxe2x80x9d refers to a portion of a system in which the operation of associated devices is based on a particular clock signal.) Thus, the operations of the respective devices are based upon clock signals having the same rate. In the absence of any clock skew, data being transferred from one of these devices to the other must be asserted for a period of time before the data is sampled (the setup time) and a period of time after the data is sampled (the hold time.) If there is any skew between the clock signals at each of the devices, the assertion of the data must be maintained for an additional amount of time which is long enough to account for this difference. While this additional time may not be significant in relation to slower clock speeds, high-performance, high-speed microprocessors have shorter clock periods, so it may not be possible to perform data transfers quickly enough to keep up with the speed of the processor.
Clock forwarding is one technique which can be used to minimize the impact of clock skew and allow improved performance in data transfers. In a clock forwarding scheme, the data bus and system clock described above are replaced by point-to-point data and clock signals. When data is to be transferred from one device to another, the data is transferred along with a corresponding clock signal. Referring to FIG. 2, data is transferred on one or more data lines 18 while a clock signal is forwarded on clock line 19. The data is clocked into a series of storage locations (i.e. flip-flops) according to the forwarded clock signal. The data is then clocked out of the storage locations according to a local clock signal of the receiving device. Both of the clock signals must have the same rate, but a substantial skew in the signals will not prevent reliable transfer of the data.
While clock forwarding provides a means to transfer data between devices operating at the same clock rate, it is often desirable in modern computer systems to use different clock frequencies for different devices. For example, it may be useful to operate the core logic (i.e., the microprocessor logic) and the system logic at different frequencies. The difference in frequencies allows for advances in the performance of one type of logic without requiring equal advances in the other type of logic. Thus, for example, the processor speed can be increased without having to also speed up the system logic.
In these systems, system logic is closely tied to the system bus. As a result, the system logic usually operates at a frequency which is an integer (or half-integer) multiple of the system bus frequency. Because the system logic operates at a frequency which is a multiple of the system bus frequency, clock signals for the system logic can be generated from the same clock as the clock signals for the system bus. If the core logic also runs at a frequency which is an integer or half-integer multiple of the system bus frequency, it can also be easily generated from the system bus clock signal. For example, if the system bus is running at 66 MHz, the system logic and core logic can be operated at 200 MHz (three times the system bus frequency). Then, if desired, the frequency of the core logic can be scaled up to 266 MHz (four times the system bus frequency), while the system logic remains at 200 MHz.
As the operating frequency of the system bus increases, however, it becomes more and more difficult to scale up the speed of the core logic because this would require a larger increase in the frequency. For example, if the system bus is running at 400 MHz and both the core logic and the system logic are running at 800 MHz, the core logic cannot be easily scaled up to 900 MHz. That is, 900 MHz is not an integer or half-integer multiple of the system bus frequency. It may therefore be useful to operate the different sets of logic using multiple clocks instead of a single one.
The use of multiple clock domains in a computer system may create a number of problems which must be addressed in the system. One problem is that, because the clock signals in different domains are derived from different sources, the signals may not be synchronized. The signals may also experience independent, dynamic variations for which the computer system must compensate. If the computer system cannot synchronize the clock signals in the different domains, the logic in one domain will not be able to communicate with the logic in another domain. Another problem is that it is difficult to communicate between two clock domains in which the clock rates are not integer or half-integer multiples of each other.
One or more of the problems described above may be solved by the various embodiments of the invention. Broadly speaking, the present system and method are used for transferring data from a first clock domain to a second clock domain, wherein the clock rate of one of the domains is not constrained to be an integer or half-integer multiple of the clock rate of the other domain.
One embodiment comprises a method in which a plurality of serial data values (e.g., bits) are received from a device in a first clock domain and are stored in a plurality of storage locations. The data values are clocked into the storage locations at a first clock rate corresponding to the first clock domain. The data values are then retrieved from the storage locations at a second clock rate corresponding to a second clock domain and are transferred to a device in the second clock domain. If the clock rate in the first clock domain is greater than the clock rate of the second clock domain, one or more of the clock pulses in the first clock domain is periodically skipped, according to a predetermined pattern. Thus, the number of data values stored in the storage locations is less than the number of clock pulses in the first clock domain during the period in which the data values are stored. When the data values are retrieved, one data value is retrieved from a storage location for each clock pulse in the second clock domain. If, on the other hand, the clock rate in the first clock domain is less than the clock rate in the second clock domain, one data value is stored for each clock pulse in the first clock domain, while retrieval of the data values periodically skips one of the pulses in the second clock domain.
One embodiment comprises an apparatus having a plurality of flip-flops, wherein the data inputs of the flip-flops are coupled to a serial data line from a device in a first clock domain. The clock inputs of the flip-flops are coupled to a clock signal from the first clock domain. The enable inputs of the flip-flops are coupled to a load counter which cyclically enables each of the flip-flops to load successive data values from the serial data line into successive ones of the flip-flops. The load counter is clocked by the clock signal from the first clock domain. The outputs of the flip-flops are coupled to a multiplexer. The select input of the multiplexer is coupled to an unload counter which is clocked by a clock signal from a second clock domain. The unload counter thereby controls the multiplexer and causes it to cyclically select successive ones of the flip-flops. The output of the multiplexer is coupled to a flip-flop which is clocked by a signal from the second clock domain. The output of this flip-flop is then transferred to the device in the second clock domain. If the clock rate in the first clock domain is greater than the clock rate in the second clock domain, a data value is loaded into one of the flip-flops for each clock pulse from the first clock domain, except for periodically skipped pulses. A data value is then retrieved from one on the flip-flops for each clock pulse from the second clock domain. If, on the other hand, the clock rate in the first clock domain is less than the clock rate in the second clock domain, a data value is loaded into one of the flip-flops for each clock pulse from the first clock domain, while a data value is retrieved from one of the flip-flops for each clock pulse from the second clock domain except for periodically skipped pulses.