With the advent of the so-called Ultra Deep Submicron (UDSM) technologies, lithography scaling has created feasibility issues with respect to the currently used design flows and tools. These issues, depending on the perspective of the analysis, have been labelled Productivity Gap, Predictability Gap, or EDA (Electronic Design Automation) Gap.
Essentially, there appears to be an increasing difference between what it is possible to design in a given technology generation, and what is reasonably affordable to design. In particular, these problems are addressed in the article to L. Pileggi, H. Schmit et al. entitled: “Exploring Regular Fabrics to Optimize the Performance-Cost Tradeoff”, DAC 2003.
For instance, as CMOS technology scales to finer feature sizes, it can be verified that complexity of integration grows exponentially. A dramatic increase in the number of physical parameters to be controlled leads to higher cost and lower accuracy of the design models and tools on which designers base their assumptions when designing new products.
In particular, UDSM technology appears to be more problematically affected than earlier ones to factors such as process variations, wire coupling, thermal variability, static and dynamic power integrity, electro-magnetic interference and others. Those factors impact timing analysis, making the modeling of the timing behavior of a circuit realized according to the UDSM technology, and thus the timing closure of the corresponding designs, more and more unreliable.
In addition to this, new challenges are imposed by layout printability: faithful reproduction of the layout shapes on silicon is more complex and unreliable at every scaling of the prior technology, and thus it becomes extremely critical for the new UDSM technology. Optical Proximity Correction (OPC) techniques are usually applied in order to address this issue, but they are only effective to a given extent.
In particular, printability variations impose parametric failures, resulting in variation of gate strengths and clock skews, thus leading to timing errors not detectable by the standard design and verification flows, since these flows usually analyze a planned layout, but not its deviations, during the manufacturing process.
Still worse, most of the faults induced by the above indicated factors are not detected by traditional fault models which are commonly used by automated test pattern generations (ATPG). All the above has a dramatic impact on yields and EDA tools designed to increase them, and thus, ultimately, on integration costs of a circuit realized according to the more recent technologies, in particular to the UDSM technology.
Yield problems, printability rules and the need for timing predictability impose to layout designers the introduction of restrictive rules when designing a circuit, the so called Design for Manufacturing (DFM) rules. Design regularity has often been suggested as the most suitable approach to challenge manufacturability issues.
Regular, repetitive design approaches, such as the approaches that fall under the broad label of Structured ASICs (S-ASIC), show inherent advantages over standard-cell based design flows, as described for instance in the articles to B. Zahiri entitled: “Structured ASICs: Opportunities and Challenges”, ICCD 2003, and to Kun-Cheng Wu, Yu-Wen Tsai entitled: “Structured ASIC: Evolution or Revolution?” ISPD 2004. The underlying concept behind such Structured ASICs is as follows: although there is a variety of alternative architectures, these architectures are based on a fundamental element called “tile” or “module”. The tile contains a small amount of generic logic implemented either as gates and/or multiplexers and/or a lookup table. Depending on the particular architecture, the tile may contain hardwired sequential elements (i.e. flip-flops, small SRAMs).
An array of tiles is then prefabricated across the face, or in a specific region of the chip. As a consequence, the majority of the layout mask layers are also prefabricated. This means that transistors forming the core logical functions of each tile (gates, multiplexers, etc) are already available and wired together. Also, the large part of the local and global interconnections has been implemented too.
As a consequence, the customization of the above described logical functions towards a final product for a given application is achieved appropriately designing a reduced set of metallization layers and via connections. In particular, only few remaining via/metallization layers are to be specified in the manufacturing flow when using a S-ASIC in order to customize the desired functionality of a final product to be obtained.
Structured ASIC solutions, especially at the full-chip scale, are widely used. For instance, in its Hardcopy program, Altera offers the possibility to convert a design mapped on its FPGA families to a metal and via-programmable support, thus ensuring smooth transition between flexible prototyping platforms to high performance hardwired solution. This approach is described for instance in U.S. Pat. Nos. 7,030,646, 7,243,315 and 7,064,580.
LSI Logic has taken a similar path, as described for instance in U.S. Pat. Nos. 6,954,917 and 6,690,194, by providing a mask-programmed Structured ASIC approach defined RapidChip, based on a logic gate array fabric. Complementing the Altera approach, LSI provides a smooth transition flow between RapidChip designs and their standard-cell based solutions. Other semiconductor manufacturers have also formalized their Structured ASIC product portfolio. It is very common to complement Structured ASIC manufacturing products with embedded hardwires units such as Memories, specific DSP acceleration units or Microprocessors (Platform ASIC).
The relevant advantage of a Structured ASIC based design is that this approach allows to significantly reduce Non-Recurring Engineering (NRE) costs (i.e. the one-time cost of researching, designing, and testing a new product) and implementation issues, as described by F. Campi et al. in the article entitled: “Sustainable (re-)configurable solutions for the high-volume SoC market”, IPDPS 2008. In particular, using Structured ASICs, implementation costs are significantly lowered since the number of masks to be redesigned per each product is reduced by roughly two-thirds, and design costs are also reduced because many critical design issues, such as clock distribution or scan-chain insertion, are often already handled on the prefabricated logic.
Even more significant implications apply to manufacturability issues: while a standard cell library may include a few hundred of different cells, whose position and connection in a given silicon region vary completely in different design implementations, Structured ASIC design can focus over small and localized regions that are regularly repeated. On those regions, investment can be concentrated, from the manpower and tools points of view, in order to maximize performance while retaining manufacturability.
Moreover, differently from standard cell design, the placing of Structured ASIC cells is known in advance and completed by distributed buffering, thus providing a regular pattern that greatly eases timing characterization of the final design. On the other hand, Structured ASIC-based design induces overheads that may not be acceptable for all segments of the semiconductors market. In fact, depending on the chosen technology orientation, Structured ASIC approaches may impose a 1.3× to 2× multiplicative factor in area occupation and a 1.5× to 3× dividing factor in performance. For computational intensive applications this may not be applicable, severely limiting overall system performance or overriding the maximum area specifications.
In summary, while a full-scale Structured ASIC approach is very appealing for low to medium volume market segments, it is clearly not acceptable for other segments, for example high-end signal processing Systems-on-Chip (SoCs). In particular, in this market segment, hybrid chips have been used, wherein the standard cell-based design is enhanced with one (or more) mask-programmable regions, as described in the article to L. Cali' et al. entitled: “Platform IC with Embedded Via Programmable Logic for Fast Customization”, CICC04, wherein a Structured ASIC to be embedded in a SoC architecture to provide an application specific customisation is described.
Also known are approaches using Structured ASICs to be embedded in hybrid Systems-on-Chip, as described for instance in U.S. Pat. Nos. 6,331,790, 6,580,289, 6,873,185, 7,248,071, 6,014,038, 6,943,415 and 6,690,194. In particular, U.S. Pat. No. 6,331,790 describes a design method for a Structured ASIC using SRAMs and thus of a configuration bitstream. In this case, the need to store and handle a configuration bitstream imposes an overhead in area and control issues that could not match the requirements of different semiconductor product market sectors.
Of course, the two options (Structured ASIC and hybrid approaches) can be merged or complemented in many ways. In particular, the designers determine a cost function based on three independent parameters:
1. what parts of the design are timing critical (so they may be designed with a costly standard optimized full-mask layout approach to meet specifications);
2. what parts of the design can be considered fixed over a large spectrum of customizations (so they can be designed with a costly standard optimized full-mask approach as the cost can be amortized over large volumes); and
3. how much area overhead can be afforded in the design (so that significant portions of the design, not falling in the two above categories, can be designed with lower optimization effort and/or exploiting mask-programmable technologies).
The aspects to be optimized are namely linked to mask costs, NRE costs, TTM (Time to Market) requirements, and manufacturability. More precisely, since the above factors are strictly correlated, the aim is to determine a best ratio induced by the obtained manufacturability margin versus the corresponding NRE cost.
A significant added value of the mask programmable hardware, in particular in the high volume domain, is the opportunity to alter the above ratio in the direction of a higher manufacturability per cost unit. In particular, the design complexity of high volume Systems-on-Chip is such that this result could be obtained at negligible or very low performance/area overhead by carefully managing the parameters outlined above.
In the field of Systems-on-Chip design, various different design methodologies are used in order to minimize design costs and shortening time to market in the deployment of a given DSP accelerator. On the one hand, carefully tuned and specifically verified pre-laid out macros for IP (Intellectual Property) reuse are commonly used. In particular, reusable IP macros for System-on-Chip design can be fully hardwired logic blocks which may be either analog or digital (i.e. SRAMs, PLLs, I/O controllers, Arithmetic Units). In other cases, some design customization is allowed by the user and it is deployed at software level (Microcontrollers, Digital Signal Processors), or exploiting run-time reconfigurable hardware units (Field Programmable Gate Arrays or FPGAs, Coarse Grained Reconfigurable Architectures).
On the other hand, Design-time reconfiguration is largely exploited at the Register Transfer Language (RTL) design level, in particular via Hardware description language (HDL) constructs, such as constants and generics, or via EDA tools, such as Synopsys' CoreTools. Also known are the application specific or configurable processors, i.e. processor architectures that can be customized to a given application domain by “instruction set metamorphosis”, adapting their pipelining and adding function units configured to accelerate the specific functionalities required by the application. These known processor architectures are described for instance in U.S. Pat. Nos. 5,870,588, 6,988,154, 7,010,558, 6,701,515, 6,988,154, 6,760,888, 6,477,683, 6,941,548 and in U.S. patent application Ser. No. 10/196,423. In these documents, different methodologies are described for defining the RTL architecture of a processor, their effectiveness being put at stake by the manufacturability and timing analysis issues as described above.
Finally, U.S. Pat. No. 7,200,735 describes a hybrid processor, in which the base processor includes a custom logic circuit and the configurable logic circuit includes standard cell or gate-array logic circuits, the hardwired control section being described at RTL level, relying on traditional design flows for physical implementation.
It is however remarked that, if deployed on aggressive technology nodes, this known approach may lead to physical issues, encountering pitfalls and uncertainties that affect, for instance, design flows for Ultra Deep Submicron technologies. Moreover, designers exploiting such an approach may be forced to entirely restart an implementation flow for each utilization of the same RTL, with the related TTM issues, and design and verification costs.