1. Field of the Invention
The present invention relates to digital clocking circuits and, more specifically, to a clocking circuit used in serial-to-parallel communications.
2. Description of the Prior Art
In many computer systems, high Speed Serializer-deserializer (HSS) cores are used in application-specific integrated circuits (ASICs) and custom integrated circuits for communication from processor-to-processor and processor-to-input/output devices. The receiving portion of an HSS core takes one (or more) high speed serial data lanes and converts each data lane into parallel data at a much slower frequency. In one example, shown in FIG. 1A, a representative existing HSS internal receive (Rx) interface consists of a clock (RxDCLK) and a parallel data bus (RxD(7:0)). (It should be noted that use of an 8-bit wide bus is used as an example only.) A deserializer 10 receives data from a serial data stream and places units of the data onto a parallel bus. Each time a new unit is placed on the parallel bus, the clock 12 asserts an RxDCLK signal, indicating that the data on the parallel bus is valid. Given that the RxDCLK signal lacks sufficient power to enable all of the devices that typically access the data; the clock has to be repeated by a clock tree 16. The clock tree 16 includes an increasing series of repeaters 18 that generate a duplicate of the RxDCLK signal from the clock 12, delayed by a predicted amount of time. When the delay of each successive repeater 18 is added together, a substantial tree delay is propagated through the system.
Sampling the data with a device 14 can be problematic because the repeated clock signal at an end point of the clock tree may have a substantial delay from the original RxDCLK signal generated by the clock 12. In a timing diagram 20, as shown in FIG. 1B, the leading edge of the RxDCLK signal plus the tree delay could be half of a clock cycle, or more, after the leading edge of the RxDCLK signal by itself. If the device 14 reads the data on the leading edge then the data on the parallel bus is not valid when the leading edge of the RxDCLK signal plus the tree delay is asserted.
Returning to FIG. 1A, one existing solution to this problem is to add a delay 20 equal to the tree delay to the parallel data, thereby matching the delay of the clock tree. As can be seen in FIG. 1B, this causes the data on the parallel bus (RxD(7:0)+DATA DELAY) to be aligned with the RxDCLK signal plus the tree delay.
This solution has several disadvantages, including: (a) extra cells are needed for delaying each of the data signals; (b) manual intervention is required in physical design of the chip to ensure that the delays added to the parallel data paths end up being the correct amount to match the clock tree delay and the variation of two relatively long paths need to be managed; and (c) the delay added to the parallel data adds to the overall latency of the interface.
Therefore, there is a need for a system in which parallel data may be read by a plurality of devices with a minimum latency.