The present invention relates to an arrangement and a method for synchronising data to a local clock. The invention is useful in transferring data between sub-systems. The invention incorporates a self-tested self-synchronous two-phase input port by means of which a line or an element of parallel data is tested for data read failure using two different phases or edges of the local clock. If a data read failure is detected the other of the two phases is selected for reading the data.
As the clock frequency on silicon chips increases, the clock phase becomes more difficult to calculate or predict. To avoid data read failure, global clock synchronisation is commonly used to keep a system working synchronously. However, the global synchronisation for example implemented with a balanced clock tree has many drawbacks. First, it needs more metal layers resulting in high costs. Second, the power dissipation for the clock distribution network is very large and for some state-of-the-art designs of microprocessors the power consumed by the clock network ranges from 18-40% of the total power. In addition, systems do not scale well because of constraints of timing. Furthermore, a PLL or DLL is needed to compensate for the propagation delay of the local clock driver, and significant effort is required to cope with delay and skew reductions. As for the system scaling and clock frequency, physical limits will eventually be reached for the future high-performance ULSI design unless we can avoid the global synchronisation.
The present invention relates to self-tested self-synchronisation implemented with a two-phase input port for high performance ULSI design. An input signal with unknown delay can be correctly latched without suffering from data read failure.
The idea of the method is to use the same clock frequency but with an arbitrary local phase in each sub-system, and it automatically selects a clock edge for sampling data so that an error free parallel data transfer is achieved. The self-synchronization may be accomplished with the help of inserting a test signal and the error status of test signal is used to select a clock edge or clock phase to get error free parallel data transfer between sub-systems. By this method, the global clock synchronization is avoided, so there is no need for a balanced tree for the clock distribution and skew reduction techniques. Therefore, significant simplification is achieved by this invention. The power consumed by the clock distribution is efficiently reduced because there is no need of using wide metal wires to shorten the delay, and it is more suitable to use distributed clock drivers in each sub-system.
Thus, the present invention relates to an arrangement for synchronising an incoming stream of data to a local clock.
According to the invention, the arrangement comprises a data read means for reading parallel elements of the data stream using one or two different phases or edges of the local clock, a data read error detecting means arranged to sample at least one element of the data stream using the two different phases or edges of the local clock, and a decision making means. If the data read error detecting means detects a data read error using one of the two different phases of the local clock, the other phase or edge of the local clock is selected by the decision making means for reading the parallel elements.
Preferably, said one element of the data stream is a special test signal having a fixed data pattern.
The invention also relates to a method for synchronizing an incoming stream of data to a local clock.