1. Technical Field
The various embodiments described herein are related to application specific integrated circuits (ASICs), and more particularly to the design of various ASICs.
2. Related Art
Continuing advances in semiconductor device fabrication technology have yielded a steady decline in the size of process nodes. For example, 22 nanometer (nm) process nodes were introduced in 2012 but were quickly succeeded by 14 nm fin field-effect transistors (FinFETs) in 2014 while 5 nm process nodes are projected for 2020.
The decrease in process node size allows a growing number of intellectual property (IP) cores or IP blocks to be placed on a single ASIC chip. That is, modern ASIC designs often spread numerous process nodes across a comparatively large silicon die, and include combinations of IP blocks and logic functions. At the same time, modern technology also requires increased connectivity and large data transfers between various IP blocks. In addition, modern ASIC chips frequently include multiple clock domains in order to leverage multi-core implementations. Thus, one or more clock signals may need to be distributed across the chip in a manner that minimizes clock skew. For instance, the edge of the clock signal received at a logic block located near a clock source should be aligned with those received at more distant logic blocks. To simplify data exchange among heterogeneous IPs, data is exchanged using a shared communication protocol (i.e. AMBA-AXI) which make use of Master and Slave interfaces.
Conventionally, a balanced clock signal distribution (i.e., timing closure) is achieved by inserting buffers. For example, pursuant to a traditional ASIC design flow, after floor planning and placing various IP blocks, a clock tree (i.e., a clock distribution network) may be synthesized and buffers may be added along the signal path from a clock source to various IP blocks according to the clock tree. In fact, timing closure for a clock signal that is distributed over a large and complex ASIC design typically requires the strategic placement of numerous buffers. Moreover, the distribution of a clock signal is also highly susceptible to both systematic and random variations. In particular, proper timing closure must account for the effects of on-chip variations that arise as a result of different process, voltage, and temperature (PVTs) and operation modes, which would otherwise introduce additional clock skews. As such, the most laborious and time consuming aspect of conventional ASIC design tends to be clock alignment. Clock tree synthesis and timing closure generally require significant manual intervention. In addition, the mechanisms (i.e., buffers) used to balance the clock across an ASIC chip generally consume a majority of the power in any conventional ASIC design. If the distance between to IPs is too long for guaranteeing timing closure, one or more Protocol Pipeline Stage (i.e. AMBA Register Slice) is inserted between them to relax timing between the two IP ports, complicating even more top level clock routing and balancing, and significantly increasing the used area.
Therefore, what is needed are an apparatus and method that overcome these significant problems found in the aforementioned conventional approach to modern ASIC design.