1. Field of the Invention
The present invention relates to electronic circuits and the field of distributed clock circuits. More particularly, the present invention relates to a method and circuit for synchronizing clock signals from separate clock domains with minimized latency.
2. Description of the Related Art
The demands created by today""s high-speed electronic equipment have generated a number of problems for circuit designers and manufacturers. For example, many applications require that two subsystems running at different frequencies communicate with each other. Generally, logic running at a given clock frequency is said to be operating in a clock domain.
This synchronization problem has been previously addressed either by eliminating one of the clock domains or by adding synchronization logic. Unfortunately, the synchronization logic adds unwanted latency due to the additional circuitry. Moreover, the disparity between the clock domains may include different frequencies and/or phases, further complicating the synchronization circuit design and adding to the latency. Alternatively, eliminating one of the clock domains is not always feasible because there are practical limitations as to how many components a single clock source may support. Also a single clock domain will limit the independent optimization of each subsystem.
An example of a system with two clock domains is a memory subsystem that contains a memory clock domain and a controller clock domain. As stated above, the simplest solution to the clock domain problem is to ensure that a system only has one clock domain.
FIG. 1 shows a prior art system that contains only one clock domain. A clock source CLKSOURCE 102 uses a crystal 104 to generate a high-frequency clock, BUSCLK 106. In this example, BUSCLK 106 is shown traveling past a controller CTRL_A 108 to a termination resistor 110. The use of terminated transmission lines is common place in high-speed clock distribution, but is not required for this discussion.
In FIG. 1, BUSCLK 106 is buffered by buffers 112, inside controller 108. The use of buffers is common practice, but not required. Finally, the buffered version of BUSCLK 106 drives a clock divider C 114 which divides BUSCLK 106 to generate a clock called SYNCLK 116. The divider could have any value, including one (i.e., SYNCLK=BUSCLK).
A key aspect of FIG. 1 is that all of the logic in controller 108 runs off the same SYNCLK 116. SYNCLK 116 is buffered by buffers 117 and output from the controller 108 to drive the rest of the system as the system clock, SCLK_A 118. Since all of the control logic and the entire system run off a clock derived from SYNCLK 116, there are no clock domains to cross and no asynchronous data transfers required. However, it is very restrictive to require an entire system to run off one clock domain, and this approach is not practical for most systems. For example, running the system using one clock signal will result in each subsystem not being optimized to its fullest potential. Hence, each subsystem will, instead, be restricted by the limitations posed by a different subsystem.
FIG. 2 illustrates a more common approach. Elements appearing in FIG. 2, which were introduced in FIG. 1, are referred to with the same reference numerals which were originally used. In FIG. 2, CLKSOURCE 102 generates BUSCLK 106, which is divided to generate SYNCLK 116. However, in FIG. 2 a separate clock source MAIN CLK SRC 208 generates a second clock, SCLK_B 210, which is used by the rest of the system. SCLK_B 210 is buffered by buffers 211 to generate PCLK_B 212, inside CTRL_B 214. Alternately, SCLK_B 210 could be divided or multiplied to generate PCLK_B 212. After the clocks are generated, there are two clock domains, that of PCLK_B 212 and that of SYNCLK 116, between which data needs to be exchanged.
Because PCLK_B 212 and SYNCLK 116 are asynchronous, data cannot be exchanged directly from logic running in one clock domain to logic running in the other clock domain without losing data. Instead, data needs to be synchronized as it is passed between the two clock domains. For example, in FIG. 2, FIFOs 216 are shown which are driven by both PCLK_B 212 and SYNCLK 116 to synchronize data that is transferred between the domain of PCLK_B 212 and the domain of SYNCLK 116. While this synchronization is effective in solving some of the clock domain crossing problems, it adds additional latency to the data transfer.
For example, when two clock domains are asynchronous (no frequency or phase relationship), blocks of information are typically transferred with dual port memories. Data is written into a memory from one clock domain and read from the memory by the other clock domain. A second memory is needed for communication in the reverse direction. Control signals coordinate these empty-fill operations. The control signals are often double-sampled with registers in each clock domain to avoid metastability problems. This solution is robust, but typically has a significant latency cost because of the synchronization delay. Additionally, it can have a bandwidth cost if the empty-fill operations can not be overlapped because of synchronization overhead.
In view of the foregoing, it would be highly desirable to synchronize clocks from different clock domains, for example in a memory system, while minimizing any latency caused by the additional synchronization circuitry.
The present invention provides a method and apparatus for synchronizing signal transfers between two clock domains, where the clock domains have a gear ratio relationship. A gear ratio means that the clocks are related by a ratio, such that each clock has a different integer number of clock cycles in a common period. Also, in addition to a gear ratio relationship, the clocks may have a synchronized edge at the end of the common period. For each clock, the cycles in the common period are xe2x80x9ccoloredxe2x80x9d, i.e., identified by a number (1st, 2nd, etc.). By using the coloring technique, the appropriate clock edge to perform a data or control signal transfer can be identified. The edges are preferably chosen to minimize the latency of the transfer.
In one embodiment, after a clock edge of the faster clock strobing the data into a buffer, the appropriate clock edge of the slower clock to strobe out the data is the next rising clock edge of the slower clock in the common period. This relationship results in only some of the fast clock edges being used for strobing data in, but all of the slow clock edges being used for strobing data out.
Conversely, for data transfers from the slow clock domain to the fast clock domain, the invention preferably uses the latest fast clock rising edge after a slow clock rising edge strobing in the data from the slow clock domain, but before the next slow clock rising edge strobing in the next data. Although the next fast clock edge could be used, since there are more fast clock edges than are needed for maximum slow clock bandwidth, the latest clock is chosen to maximize the data setup time.
The invention can be applied to different clock ratios by appropriately varying the color code (number of cycles in the common period) and by varying which color value is used for the strobing. Thus, by simply programming registers, for example, with new color values and new selected color values for transfers, the same physical hardware can accommodate many different gear ratio clocks.
In yet another embodiment, the present invention provides a method and apparatus for a distributed clock generation loop which generates clock signals that allow asynchronous data transfers between different clock domains with minimized latency. This aspect is helpful, in part, because even if two clocks are related by a gear ratio, there is no inherent phase relationship between their phases. The distributed loop comprises at least one clock divider, a phase detector, and a variable delay element (phase aligner). For example, clock dividers are used to divide down the clocks that define the clock domains to a common frequency. The divided clocks drive a phase detector, which drives a phase aligner. The distributed loop shifts the phase of one of the divided clocks to align it with the other divided clock. When the divided clocks are phase aligned by the distributed loop, the original clocks will have edges which are also phase aligned. Data can then be transferred at the aligned clock edges without incurring additional latency for synchronization.
In one embodiment, in order to reduce power consumption in a low power mode, the output of a clock generator is disabled without disabling the clock generator in its entirety. This eliminates the power required to drive the load on the clock line, while avoiding frequency and phase drift, thus eliminating the latency normally required to re-acquire frequency and phase lock when coming out of a low power mode. This is accomplished by separating the phase alignment feedback and frequency lock feedback in one embodiment.
In addition, multiple clock domains are provided in one embodiment, which are separately synchronized. This, for example, allows clock domains not in use to be powered down. Also, simultaneous synchronization among multiple clock domains will permit transfers between more than two clock domains at the same time.