High performance mesochronous synchronizers are used to synchronize data between two clocks which are synchronous (originating from the same source) but have an unknown phase relationship. Mesochronous synchronizers for source-synchronous links synchronize data that is in the domain of the clock that is sent with the data (xe2x80x9cdata strobexe2x80x9d) to the receiving chip""s core clock domain. The data strobe and the receiving chip""s clock are mesochronous because the data strobe is created from the sending chip""s core clock, which is generated from the same source (a crystal oscillator) as the receiving chip""s clock.
The unknown phase relationship between the data strobe and the receiving chip""s clock is caused by a number of factors including: 1) the latency of data traveling from the sending chip to the receiving chip, 2) the clock skew between the sending chip and the receiving chip and 3) process, voltage, temperature (PVT) variations. Regarding the latency of data traveling from the sending chip to the receiving chip, this latency is primarily due to package delays, board trace lengths, etc. The problem this arbitrary latency creates is that when the data strobe arrives at the receiving chip, the system designer does not know where the clock edge of the data strobe is relative to the clock edge of the receiving chip""s core clock.
The unknown phase relationship between the data strobe and the receiving chip""s clock is also caused by clock skew between the two chips. The clock on the sending chip generates the data strobe. Thus, if the sending chip clock and the receiving chip clock have an unknown phase relationship due to clock skew, so will the data strobe and the receiving chip clock, even before the data strobe is sent onto the link.
The third factor causing the unknown phase relationship between the data strobe and the receiving chip""s clock are PVT variations. Boards and the chips on the boards can vary in speed due to variations in the manufacturing process. Because of differences in manufacturing process, different parts of the same board or chip can also have variations due to manufacturing process variations. Thus, even the same circuit on different chips (that have the same design) could have differences in timing. These differences in timing can lead to phase uncertainty.
For a source synchronous link, typically a phase-locked loop (PLL) or delay-locked-loop (DLL) device at the receiving chip interface compares the incoming data strobe and the receiving chip""s clock signals and produces an analog error signal that represents the relationship between the two signals. The analog error signal is used by other PLLs or DLLs to adjust the timing of the incoming data bits by the same relationship. The receiving chip clock can then be used to sample data without fear of metastability in a minimum amount of time.
The PLL or DLL circuitry works well in distributing the chip""s core clock to different chip locations so that all locations on the chip see substantially the same core clock signal. However, one of the main problems with PLL or DLL circuitry is its complexity. The complexity of the PLL circuitry increases the possibility of manufacturing faults and design errors and makes the synchronizer hard to test by traditional testing methodologies without interfering with synchronizer performance.
Conventional testing methodologies can result in decreased synchronizer performance. One testing methodology typically requires adding a device, for example a multiplexer, in the data path which adds latency to the functional path. The synchronizer could also be testing by hanging an observation flip flop off the node under test. This configuration would result in increased loading due to the capacitive loading of the flip flop. The increased loading may result in increased latency which is clearly undesirable and may result in decreased system performance
Another problem with using DLL or PLL circuitry is the amount of chip area required for synchronizer implementation. Although not practical, in the ideal case the receiving chip would have a PLL or DLL synchronizer device for each bit where a clock signal is received. Obviously, this would take a large amount of chip area, which is undesirable so typically a tradeoff between ideal performance and ideal clock signal timing is balanced against the amount of chip area required for implementation.
Today""s high performance systems require a large bandwidth. In order to achieve the large bandwidth required in a reasonable area, synchronizers must be small and simple. Bandwidth can be increased by bitslicing logical data among different chips, in which case the synchronizers must be able to send all parts of a data packet to the chip cores in lockstep. However, a problem with PLL and DLL circuitry is that the circuitry cannot be easily used in a system that must synchronize data across multiple bitsliced interfaces where the data must be synchronized by different chip clocks in lockstep. Conventionally, communication among the different chips would have to occur because of 1) clock skew between the chips and 2) varying tracelengths from the sending chip(s) to the receiving chip(s) that make it unknown when data in different slices arrive relative to one another and what their strobes"" phase relationship is. Communication between different chips would increase link latency, increase system complexity, and increase PLL or DLL circuit complexity.
A highly reliable synchronizer circuit which provides excellent synchronization without using complicated PLL or DLL circuitry, which is simple to test, which is easily adaptable to system which use bit-sliced data, and which does not require a large chip area is needed.
The present invention provides a highly reliable synchronizer which provides excellent synchronization without using complicated PLL or DLL circuitry, which is simple to test, which is easily adaptable to systems which use bit-sliced data, and which does not require large chip area. The synchronizer is comprised of a first stage, a data capture circuit, preferably comprised of pair of master-slave flip-flops, that is electrically coupled to a second stage, a data interface circuit that preferably includes a FIFO comprised of N transparent latches that are electrically coupled to a multiplexer. The lack of complexity of the synchronizer design makes it smaller, faster, easier to test, and less prone to design error and manufacturing limits.
The synchronizer has a low latency because of 1) its simple design and 2) its clocking scheme. The synchronizer design preferably includes two stages; a first stage comprised of two data capture flip flops connected in parallel and a second stage comprised of N transparent latches electrically coupled to a multiplexer. Thus, in the preferred embodiment, data only passes through two stages of sequential elements and a multiplexer before being clocked into the receiving chips"" clock domains, so that the synchronizer is both fast and small. Thus, the latency of the synchronizer can as small as the clk-q of a master-slave data capture flop plus the delay through the transparent latch and the delay through a multiplexer.
Regarding the clocking scheme, the transparent latches in the FIFO are clocked on the clock edge opposite to the edge clocking the first flip flop. This makes data available to be registered by the receiving chip""s clock domain sooner, which reduces the amount of time the receiving chip wastes waiting for data to be available. Using transparent latches clocked in this manner also allows for more tolerance in data to strobe matching going from data capture flip flops to the FIFO because the latching edge of the FIFO latches naturally falls in the middle of the data""s valid window. The data""s valid window is twice as long as it was going into the data capture flops, which gives additional tolerance.
The simple low latency synchronizer described by the present invention was designed for source-synchronous links going to multiple bit-sliced chips. Bitslicing with low latency is achieved using a FIFO in the synchronizer and through the use of a synchronization signal called the global frame clock (GFC). The second stage of the synchronizer, which implements a FIFO, allows data symbols to wait for other data in its logical packet so that data symbols can be sent to their respective chip cores in lockstep. The GFC marks a unique chip core clock cycle on each chip, thus allowing the synchronizer control logic to pull data from the FIFO on the same clock edge. The GFC is distributed to all the chips on the link so that none of the chips need to communicate directly. This allows synchronization of bitsliced data to be much faster than if communication between the chips had to take place. The synchronizer design can be used with any number of bitsliced interfaces.
An additional advantage of the synchronizer of the present invention is that the synchronizer design can be easily used for double-pumped source-synchronous link by duplicating the same logic and running them off of the opposite edges of the link strobe. This flexibility is advantageous because it allows twice as much data to flow through the data link without doubling the complexity of the synchronizer.
A further advantage of the synchronizer of the present invention is that the two stages of sequential elements in the link strobe domain allow the synchronizer to decouple the electrical needs of the high-speed link from the synchronization needs of a bit-sliced link so that both data capture and synchronization can be achieved without compromising the other. Electrically the link desires as light a load as possible (fewer flip flops) close to the receiver. The first stage of the synchronizer is used to capture data coming from the link, so it only requires flip flops enough to register the data, enabling a smaller load on the data. The second stage of the synchronizer contains transparent latches making up the synchronizer""s FIFO.
However, synchronization is not straightforward with bit-sliced data (data to be synchronized with one another arrive at very different times, clock skew and other variations are large, trace lengths are long, etc.) so the synchronizer desires a large number of latches in the FIFO. Since the link does not see any loading due to it, the FIFO can contain as many latches as are needed to address all the synchronization issues of a bit-sliced interface. Also, any additional complexity that comes with the synchronizer circuit and the FIFO is decoupled from the sensitive data capture flops.
Finally, since the synchronizer is made up of everyday logic elements, testability is relatively easy. The synchronizer enables testing to occur without complicating the circuits or adding loading or latency to the functional paths. In the preferred embodiment, an eight-to-one multiplexer is used to choose between six transparent latches. A testability input can be inserted on one of the two remaining unused multiplexer inputs, thus bypassing the synchronizer. This allows the core of the chip to be tested with no effect on the synchronizer latency and only minor changes to the FIFO control logic. Other bypass modes could be envisioned as well.
A further understanding of the nature and advantages of the invention described herein may be realized by reference to the remaining portion of the specification and the attached drawings.