The present invention relates to techniques for synchronizing a signal across multiple clock domains. More particularly, the present invention relates to techniques and circuits for efficiently and reliably capturing an input signal synchronized with a first clock having a first clock frequency and for producing an output signal synchronized with a second clock having a second clock frequency different from the first clock frequency.
As integrated circuits (ICs) become more complex, designers are constantly looking for ways to pack more functionalities into the IC chip. A higher level of functionality typically necessitates multiple clock domains, with each clock domain handling a different subcircuit of the IC. Generally speaking, once can expect multiple asynchronous clocks in an IC having a moderate to high level of complexity.
One of the challenges in working with multiple clock domains is the need to synchronize a signal across multiple clock domains. Take for example a control signal. A pulse on a control signal that is synchronized with one clock domain may be employed to control other circuitries clocked by another clock domain having a different frequency and/or phase.
When synchronizing a signal across two different clock domains, two possibilities exist. In the first case, the first clock domain is the slower of the two, and in the second case, the first clock domain is the faster clock domain. When a signal needs to be passed from a slower clock domain to a faster clock domain, one typically expects the faster clock domain, with its higher frequency, to have no difficulty capturing pulses on the signal that has been synchronized using the slower clock domain. FIG. 1 illustrates this case wherein IN_CLK (102) is the slower clock signal and SYNC_CLK (104) is the faster clock signal. A signal IN_PULSE (106) that is synchronized with respect to the slower IN_CLK (102) is also shown. Since the clock signal SYNC_CLK (104) is faster than the clock signal IN_CLK (102), one expects no difficulty in using the clock signal SYNC_CLK (104) in capturing pulse 110 of the signal IN_PULSE (106) to produce the pulse 112 on the output signal SYNC_PULSE (108).
In the second case, as mentioned, the second clock domain is the slower one. In this case, it is more difficult to reliably capture the signal, and a simple latching circuit that simply latches the state of the input signal using the second clock will typically not suffice. The difficulties involved in capturing a signal using a slower clock are illustrated below in connection with FIG. 2A and FIG. 2B. In FIGS. 2A and 2B, the faster first clock, the input signal, and the slower second clock are the same except that in the case of FIG. 2B, there is a phase shift with respect to the slower second clock. As will be seen, unless a specialized circuit is employed, the phase shift will by happenstance cause a simple latching arrangement to miss the input pulse on the input signal entirely, thereby producing erroneous results.
Referring now to FIG. 2A, there is shown a clock IN_CLK 202, which has a higher frequency than the slower clock SYNC_CLK 204. A pulse 206 is shown on signal IN_PULSE 208, which is synchronized with respect to the faster clock IN_CLK 202. As shown, upon a rising edge 210 of the slower clock SYNC_CLK 204, the high state of signal IN_PULSE 208 is detected, giving rise to a rising edge 212 of the output signal SYNC_PULSE 214. Upon a rising edge 216 of the slower clock SYNC_CLK 204, the low state of IN_PULSE 208 is detected, resulting in a falling edge 218 of the output signal SYNC_PULSE 214. As it happens, the pulse 206 on signal IN_PULSE 208 can be captured by the slower clock SYNC_CLK 204 in the example of FIG. 2A (as evidenced by the pulse between reference numbers 212 and 218 on the SYNC_PULSE 214 signal).
Consider now the example of FIG. 2B. Again, FIG. 2B shows the clock IN_CLK 202 as in FIG. 2A, as well as the signal IN_PULSE 208 with its pulse 206. There is also shown a slower clock SYNC_CLK 254. However, the slower clock SYNC_CLK 254 of FIG. 2B has a different phase relationship with respect to the faster clock IN_CLK 202 and the pulse 206 on the signal IN_PULSE 208, as compared with the clock signal SYNC_CLK 204 of FIG. 2A. In this case, upon a rising edge 256 of the slower clock SYNC_CLK 254 of FIG. 2B, the signal IN_PULSE 208 is still low. Upon the next rising edge 258 of the slower clock SYNC_CLK 254, one sees that the pulse 206 on signal IN_PULSE 208 has come and gone. When rising edge 258 happens, the signal IN_PULSE 208 is again low. Thus, no pulse is produced on the output signal SYNC_PULSE 260 of FIG. 2B despite the presence of a pulse on the input signal IN_PULSE 208. If a simple latching arrangement is employed for the synchronization task, the slower clock SYNC_CLK 254 has simply failed to capture the pulse on signal IN_PULSE 208.
In the prior art, a variety of different techniques has been proposed and employed to ensure that an input signal can be reliably synchronized across different clock domains. One of the more popular circuits for synchronizing a signal between two different clock domains is shown in prior art FIG. 3. The prior art circuit of FIG. 3 is best understood with reference to the timing diagram of FIG. 4.
Referring now to FIG. 3, there is shown a prior art circuit 300 for synchronizing a signal IN_PULSE 302 across two different clock domains: from a faster clock IN_CLK 304 to a slower clock SYNC_CLK 306. There are shown four cascaded D flip-flops Q0, Q1, Q2, and Q3. The input signal IN_PULSE is inputted into an input terminal of an OR gate 382, the output of which is inputted into the enable input of the first cascaded D flip-flop Q0. This first cascaded D flip-flop Q0 is clocked by the clock signal IN_CLK 304 as shown.
Initially, a multiplexer 320 is selected by signal CLEAR_Q0 to provide a “1” at the data input D of the first cascaded D flip-flop Q0. Upon a rising edge 422 (see FIG. 4) of signal IN_CLK 304, the signal IN_PULSE 302 goes high. At the next rising edge 424 of clock IN_CLK 304, D flip-flop Q0 latches the high data value provided at its data input, and thus causes Q0 output to go high starting with rising edge 426 thereof. Note that since D flip-flop Q0 is clocked by the faster clock IN_CLK 304, it is assured that the pulse on signal IN_PULSE 302 can be captured since signal IN_PULSE 302 is originally in the domain of the faster clock IN_CLK 304.
The high value at Q0 output is propagated to the outputs of subsequent cascaded D flip-flops Q1, Q2, and Q3. That is, the output of one D flip-flop is fed into the data input of the next cascaded D flip-flop. Since these D flip-flops Q1, Q2, and Q3 are clocked by the slower clock SYNC_CLK 306, it can be seen that the rising edge 428 on Q1 output follows the rising edge 430 of the clock SYNC_CLK 306, the rising edge 432 on Q2 output follows the rising edge 434 of the clock SYNC_CLK 306, and the rising edge 436 on Q3 output follows the rising edge 438 of the clock SYNC_CLK 306.
The value at Q2 output is fed into one input of an AND gate 340. The other input of AND gate 340 receives the inverted value of Q3 output. At the time Q2 output goes high at rising edge 432 thereof, the value of Q3 output is low. Consequently, AND gate 340 will output a high. Since the output of AND gate 340 is the desired output signal SYNC_PULSE 342, this transition from low to high is seen at the rising edge 444 of output signal SYNC_PULSE 342.
When Q3 output goes high starting with rising edge 436 thereof, AND gate 340 will output a low, which is seen by the transition at falling edge 446 of output signal SYNC_PULSE 342. Thus, irrespective of the width of the pulse in the input signal IN_PULSE 302, that pulse is captured in the slower clock domain SYNC_CLK 306.
The signal SYNC_PULSE 342, which is the output of AND gate 340, is fed back into a series of cascaded D flip-flops P0, P1, P2, and P3 through an OR gate 380. Cascaded D flip-flops P0, P1, P2, and P3 are employed to reset the D flip-flops Q0, Q1, Q2, and Q3 back to their ready state in order to prepare D flip-flops Q0, Q1, Q2, and Q3 to service the next pulse on signal IN_PULSE 302.
As shown in FIG. 3, the output of AND gate 340 is inputted into an input terminal of OR gate 380. The output of this OR gate 380 is inputted into the enable input of the first cascaded D flip-flop P0. A multiplexer 350 is controlled by the Q2 output (which has been latched high since rising edge 432 thereof), thereby furnishing a high value to the data input D of the first cascaded D flip-flop P0. The value of signal SYNC_PULSE 342 (and therefore the output of OR gate 380, which is inputted into the enable input of D flip-flop P0) is also high when rising edge 438 occurs on signal SYNC_CLK 306 before AND gate 340 pulls the signal SYNC_PULSE 342 low. The high enable input of D flip-flop P0 (from the high state of signal SYNC_CLK 342), in combination with high state of data input D of D flip-flop P0 (from multiplexer 350, which is selected by output Q2) and a rising edge 438 of clock SYNC_CLK 306 causes D flip-flop P0 to latch the high value at its data input D to its P0 output. This is seen by the rising edge 448 on P0 output.
This P0 output value is subsequently cascaded to D flip-flops P1, P2, and P3. Since these D flip-flops P1, P2, and P3 are clocked by the faster clock IN_CLK 304, it can be seen that the rising edge 450 on P1 output follows the rising edge 452 of the clock IN_CLK 304, the rising edge 454 on P2 output follows the rising edge 456 of the clock IN_CLK 304, and the rising edge 458 on P3 output follows the rising edge 460 of the clock IN_CLK 304.
The value at P2 output is fed into one input of an AND gate 370. The other input of AND gate 370 receives the inverted value of P3 output. At the time P2 output goes high at rising edge 454 thereof, the value of P3 output is low. Consequently, AND gate 370 will output a high. This transition from low to high is seen at the rising edge 462 of the CLEAR_Q0 signal.
The high state of the CLEAR_Q0 signal when P2 output goes high is seen at the select input of multiplexer 320, which causes multiplexer 320 to furnish a low value to the data input D of D flip-flop Q0. This CLEAR_Q0 signal is also inputted into one input terminal of an OR gate 382, the output of which is inputted into the enable terminal of D flip-flop Q0. Since D flip-flop Q0 is clocked by clock IN_CLK 304, the next rising edge 460 of clock IN_CLK 304 causes the low value at the data input D of D flip-flop Q0 to be latched at Q0 output (since the high CLEAR_Q0 enables D flip-flop Q0 via OR gate 382). This is seen by the falling edge 466 of Q0 output.
This low value of Q0 output is cascaded to D flip-flops Q1, Q2, and Q3 respectively with each subsequent rising edge of clock SYNC_CLK 306 (since D flip-flops Q1, Q2, and Q3 are clocked by clock SYNC_CLK 306). Thus, it can be seen that the falling edge 468 on Q1 output follows the rising edge 470 of the clock SYNC_CLK 306, the falling edge 472 on Q2 output follows the rising edge 474 of the clock SYNC_CLK 306, and the falling edge 476 on Q3 output follows the rising edge 478 of the clock SYNC_CLK 306. D flip-flops Q1–Q3 are now resetted back to their ready state.
Earlier when P3 output goes high starting with rising edge 458 thereof, AND gate 370 will output a low, which is seen by the transition at falling edge 464 of the CLEAR_Q0 signal. This low state of the CLEAR_Q0 signal causes multiplexer 320 to provide a high value to the data input D of D flip-flop Q0. D flip-flops Q0 is thus resetted back to its ready state, with a high value at its data input D and awaiting the next pulse on its enable input.
To complete the resetting process, the D flip-flops P0–P3 are reset next. Since D flip-flop P0 is clocked by the slower clock SYNC_CLK 306, at the moment rising edge 478 occurs on SYNC_CLK 306, the value Q2 output is already low (after falling edge 472 on Q2). This low value of Q2 output selects multiplexer 350 to cause multiplexer 350 to provide a low value to the data input of D flip-flop P0. Furthermore, the low value of Q2 output causes the OR gate 380 to output a high (since inverted Q2 or Q2! will be high). The high output of OR gate 380 enables D flip-flop P0, causing the low data value at the data input of D flip-flop P0 to be latched at P0 output. This is evidenced by the falling edge 482 of P0 output signal.
The low state of P0 output is cascaded to D flip-flops P1, P2, and P3. Since these D flip-flops P1, P2, and P3 are clocked by the faster clock IN_CLK 304, it can be seen that the falling edge 484 on P1 output follows the rising edge 486 of the clock IN_CLK 304, the falling edge 488 on P2 output follows the rising edge 490 of the clock IN_CLK 304, and the falling edge 492 on P3 output follows the rising edge 494 of the clock IN_CLK 304.
After the output of D flip-flop P3 is reset to the low state (i.e., after falling edge 492 on P3 output), the circuit 300 of FIG. 3 is reset and ready to service the next pulse on input signal IN_PULSE 302.
Although circuit 300 of FIG. 3 can reliably synchronize an input signal from one clock domain to another clock domain, there are disadvantages in terms of size, speed, and delay. In terms of size, there is a total of 8 D flip-flops (Q0–Q3 and P0–P3), two multiplexers (320 and 350), 2 OR gates (380 and 382), and 2 AND gates (340 and 370), or a total of 14 different sub-circuits. As circuit designers are constantly trying to pack more functionalities into a finite size IC, the high number of gate count is disadvantageous, both in terms of real estate consumption and in terms of power consumption (since power consumption rises as the gate count increases).
In terms of speed, the circuit 300 of FIG. 3 needs a total of 3 SYNC_CLK periods (referenced by rising edges 430, 434, and 438), followed by 3 IN_CLK periods (referenced by rising edges 452, 456, and 460), followed by 3 SYNC_CLK periods (referenced by rising edges 470, 474, and 478), followed by two more IN_CLK periods (referenced by rising edges 486, and 490) or a total of 6 SYNC_CLK periods and 5 IN_CLK period, in order to capture a single pulse on input signal IN_PULSE 302 and to reset itself to be ready to capture the next pulse on input signal IN_PULSE 302. The long delay period waiting for circuit 300 to complete its capture-and-reset cycle negatively impacts performance.
In terms of power consumption, beside the high level of power consumption due to the high gate count, circuit 300 also toggles all 8 of its flip-flops high and then low each time a pulse on the input signal IN_PULSE 302 is synchronized. That is, the outputs of the D flip-flops Q0–Q3 and P0–P3 go high and then are reset to the low state each time a pulse on input signal IN_PULSE 302 is captured. In ICs which are constrained in terms of power consumption and/or heat generation, the high level of power consumed to toggle these 8 D flip-flops (Q0–Q3 and P0–P3) high and then low for each pulse captured is a significant disadvantage.
In view of the foregoing, there are desired improved circuits and methods for reliably synchronizing an input signal across multiple clock domains. Compared to the prior art, the improved circuits and methods preferably perform the synchronizing task with a greater degree of efficiency in terms of gate count, delay, and power consumption.