To gain maximum performance in all new system on chip (SoC) designs, each circuit or section on the chip is designed to run at a certain frequency, which is often a different frequency to other circuits in the system. This allows the design of each circuit to trade performance and area with frequency. For example, a particular circuit such as a CPU (central processing unit) core may require high performance, and therefore will need to run at a high clock frequency, however this may require greater chip area than lower frequency designs. On the other hand, another circuit, such as a memory interface, may not need to run at such a high clock frequency, and therefore could be designed to take up less chip area.
Each circuit in the system must be able to communicate with other circuits, and in order to allow data to be successfully passed between circuits it must be re-timed. For fast changing asynchronous signals (signals with no clock relationship), this requires a method like the Valid-Ack protocol. According to this protocol, a valid signal is sent from one circuit, for example IP1, to the other circuit, for example IP2. This valid signal is retimed in IP2's clock domain, using a certain number of resynchronizers (generally D-type flip-flops) which clock the data at the IP2's clock frequency. The number of resynchronizers required depends on the two frequencies of the respective clocks and the D-type characteristics of the flip-flops, however, for most situations two flip-flops are sufficient in order to avoid metastability problems.
Once the valid signal has been retimed and detected, IP2 is able to latch any data sent with the valid signal. It then sends back an Ack (Acknowledge) signal which indicates to IP1 that it has received and sampled the data. IP1 is then able to change the data and send a new valid signal to IP2 to indicate that new data is available.
This valid ack protocol has two distinct problems. Firstly, the latency of the signal can be quite high (typically six cycles). Secondly and more importantly, the bandwidth of the data change is much lower than the respective frequencies of the circuits, as to move each block of data will take the number of clock cycles required by the protocol, which is typically six cycles. This leads to poor performance of the system.
The use of resynchronizers between clock domains also adds latency to the system. For example, if two resynchronizers are used, a delay of up to two clock cycles will be added to the system every time signals are retimed across the clock boundary. This is clearly disadvantageous and will mean that the latency, and to some extent the bandwidth of the system is reduced.