As electronic circuit manufacturing processes shrink, the ratio of wire delay to gate delay keeps increasing. An integrated circuit (IC), or chip, has operations where distant parts of the chip communicate with each other. This communication can now take much longer than the clock cycle required for even a relatively complex local operation. To address this issue, designers can run the entire design at the speeds determined by the slowest global interconnect, or run the design at one speed and the global interconnect at a different (slower) speed. Acceptance of the fact that global signals can be clocked more slowly than local signals was used in the design of the previous International Technology Roadmap for Semiconductors (ITRS) roadmaps, which distinguished between local and global clock speeds. This may work for applications that are intended for optimizing high density or low power concerns, but it is not a feasible alternative for high performance designs.
Another alternative is to switch entirely to a latency-insensitive design, including an asynchronous design. However, this has several problems. A latency-insensitive design does not help performance in the case of significant wire delay. Also, the throughput on each operation is slowed down to round-trip transaction speeds, even on operations that could potentially be pipelined. Designs protected by intellectual property (IP) can be re-used to build large chips in a reasonable amount of time. However, most IP is not available in latency insensitive, much less asynchronous, forms. Even if a latency insensitive design works with any combination of delays, some delay combinations may be much faster than others. Designers are unable to determine which combinations have less delay than others, so optimizing the design to reduce latency is a problem.
A third alternative is to pipeline the global interconnect. While this may not affect the latency, it does allow the throughput to increase with clock speed. With this approach, conventional design tools designed for synchronous digital systems can be used. Because clock frequencies are determined by the slowest local operation rather than the interconnect speed, this approach achieves an optimized throughput. Such aggressive pipelining is one of the main reasons that custom designs achieve much higher clock speeds than Application Specific Integrated Circuit (ASIC) designs while using the same process. The gains from this technique can be dramatic.
Even though this technique is highly effective, it is seldom used. The main reason is that pipelining the interconnect changes the cycle level behavior, which requires a large variety of manual rework of the design, for which there is very little tool support. Inserting clocked elements into a design is difficult and time consuming, because only some locations of the design are legal for this operation. Automatic tools, such as placers and synthesis tools, are unable to determine locations where the designer could insert clocked elements. Furthermore, the impact of inserting clocked elements into the design is difficult for tools to determine. For example, there are no tools that attempt to optimize the placement and potential insertion of clocked elements. The results of any insertion decisions should be communicated to the designer and other tools, however, no tools currently exist to automate this process.
Theoretically, then, design roadmaps could assume that designers will use interconnect pipelining to estimate an achievable clock speed for a microprocessing unit (MPU), but they fail to provide a practical method of automatically performing the pipelining during the design process. The effects of placing clocked elements to form a pipeline are therefore investigated manually. To insert, change, or delete clocked elements such as flip-flops, a designer searches through the design to perform these functions by hand. For example, a designer inserts clocked elements, such as flip-flops, by hand in the RTL design. If the designer wants to compare several possible schemes, each scheme is coded by hand before the comparison. As the design becomes firmer, some clocked elements may be identified as unnecessary, and the designer removes the unnecessary clocked elements by hand.
Consequently, designers seldom use interconnect pipelining at all because of the logistical difficulty of using conventional methods. For example, users rarely try different pipeline schemes since adding or deleting the flip-flops by hand is too hard. Likewise, during process migration or re-design, the clocked elements are seldom changed.