The present invention relates to memory for metal configurable integrated circuits. An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use (Ref-1). For example, a chip designed solely to run a cell phone is an ASIC. Intermediate between ASICs and industry standard integrated circuits are application specific standard products (ASSPs). As feature sizes have shrunk and design tools improved over the years, the maximum complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over 100 million gates. Modern ASICs often include entire 32-bit processors, memory blocks such as ROM, RAM, EEPROM and Flash, analog components, high-speed I/O's and other large building blocks. Such an ASIC is often termed a SoC (system-on-a-chip). Designers of digital ASICs use a hardware description language (HDL), such as Verilog or VHDL, to describe the functionality of ASICs. ASIC designs incur rigorous software testing, very high mask non-recurring-engineering (NRE) costs, and lengthy design, manufacturing and debug turn-around-time (TAT) to get the design working in Silicon. Smaller feature sizes and higher chip integrations make the task very complex.
First ASIC's used gate array technology (Ref-2). Ferranti produced the first gate-array termed the ULA (Uncommitted Logic Array) around 1980. In gate array design, the transistors and other active devices (i.e. logic cells having inputs and outputs) are predefined in array form and wafers containing these un-connected circuits are held in stock prior to metallization. The physical design process then defines the interconnections of the final device. All the inputs and outputs of the logic cells in the gate array are interconnected with one, two to as many as nine metal layers. These metal layers include via geometries in-between metal as well as metal lines. Uehara et. al. and Orbach et al. in U.S. Pat. Nos. 4,197,555 & 4,933,738 respectively (incorporated herein by reference) disclosed via programmability to connect or disconnect adjacent metal layers. This prior art demonstrates that via geometries between two or more layers can be used to customize gate array products. Or-Bach et. al. in U.S. Pat. No. 6,476,493 (incorporated herein by reference) disclosed M7 and via between M6 & M7 to customize gate arrays. Gate arrays reduce NRE and TAT as only few masks are needed to produce a device. Gate arrays are always inefficient as 100% of the gates can never get utilized due to interconnect inefficiencies due to limitations in metal choices compared to standard cell ASICs. In an array (or sea) of gates, software limitations in the RTL to gate conversion, and metal availability and metal & via interconnect rules all contribute to the gate utilization inefficiency.
Field-programmable gate arrays (FPGA) are the modern-day technology for building a breadboard or prototype from standard parts quicker (U.S. Pat. Nos. 4,870,302; 6,134,173; 6,448,808; 6,992,503; etc., incorporated herein by reference). By altering a “bitstream”, programmable logic blocks and programmable interconnects of the same FPGA can be modified to implement many different designs for fast debug and in-system verification. Compared to ASIC's, FPGA's are grossly inefficient: very poor in gate density, performance and power. Benchmarks in Ref-3 show that FPGA's are 40 times lower in gate density (translates to very high cost), 3 times slower in performance (translates to limitations in usage) and 15 times higher in power consumption (problem for hand-held products). In comparisons, prior art gate arrays are only about 3 times more inefficient over ASIC gate densities (Tab-4 of Ref-4 show 3 to 6 times worse). For smaller gate count designs and/or lower production volumes, FPGAs may be more cost effective than an ASIC design even in production. The non-recurring engineering cost of an ASIC can run into the millions of dollars, while the cycle time to get working silicon can be over 6 months. As IC fabrication process geometry gets smaller, the masks costs escalate, and FPGA and gate array alternatives become more attractive.
Disadvantages of FPGA compared to ASIC include the extremely high programmable overhead resources that occupy silicon real estate. A 1-million gate FPGA requires about 20-million extra SRAM configuration memory bits (30-million gates) to configure the device, and about 40-million extra MUX elements (10 million gates) to provide connectivity choices. Compared to ASICs with tens of billions of metallization choices, a 20-million configuration choice is grossly inadequate. It is easy to see the need for ˜40 extra gates per useful gate of logic in an FPGA. When logic area is large, the wire distances grow, and “RC” loading grows as square of the distance, leading to poor performance and higher power consumption. The repeating logic block in the FPGA and gate array has better design-for-manufacturing that results in better yields. The larger granularity of the FPGA repeating logic block makes it easier for global routing as the local routing and logic configuration is pre-handled within the logic block. In contrast, in a sea of gates standard cell or gate array ASIC design that uses much smaller grain cells (compared to the FPGA), one needs local connectivity to build logic gates and global connectivity to interconnect logic. Software tools have no concept of local versus global routing. Custom wiring becomes cumbersome, requires expensive tools to extract wire delays, and takes much longer to close timing constraints. Limiting the customization to a few metal layers, especially to upper metal layers such as in U.S. Pat. No. 6,476,493, poses two major challenges. First, a gate array cell library (typically more than one cell) must lend to a compact sea of cell placement with customizability at the pre-selected levels. This difficulty and inefficiency has made gate array products near obsolete in the modern era. The second is a shadowing effect surrounding a used gate; meaning there is a dead zone where gates cannot be wired by the limited wiring choices when only upper level wires are available to customize gates. The second difficulty results in lower gate utilization efficiency, even if the first obstacle is overcome. Hence, density, performance and power of gate arrays are about two or more process nodes worse than standard cells, compared to about four or more process node disparity for FPGAs. Converting an FPGA configuration “bitstream” as in U.S. Pat. No. 6,633,182 (incorporated herein by reference) to vias of a gate array simplifies the design cycle, but lacks the interconnect choices to achieve high gate density.
FIG. 1 shows prior art of constructing gate arrays. From its inception in early 1980's, gate arrays were constructed as an array of transistor cells 101 as shown in FIG. 1A. Each cell 101 comprises a plurality of inputs such as 102, and at least one output such as 103. CMOS transistors include NMOS constructed in P-well, and PMOS constructed in Nwell 113. Rules related to wells and well strapping require ground 111 bus and power 112 bus connections to transistors and wells. The gate array cell may be a NAND gate—in which case input 101 couples a gate electrode, while output 103 couples a diffusion node. Cell 101 could be a mix of different gates having equal dimensions. In this example, horizontal metal tracks 105 and vertical metal tracks 104 and associated via layers are used to customize and complete interconnects. When the customization is complete, metal interconnects traverse from an origin to a destination, they traverse allocated tracks, and go up and down through vias to avoid congestion. Each wire delay must be extracted from the post-routed placement, and simulated to see if the timing constraints of the design are met. Neighboring signals and wires present or absent impacts the time constant “RC” of such wires. Traditional gate arrays provide an array of cells, wherein inputs and outputs are interconnected by customizing via and/or metal layers. As the area of a NAND gate 101 is smaller than the total area of metal above the cell needed to interconnect these cells, not all cells 101 can be utilized, thus the gate density per mm2 of silicon is reduced in these types of devices.
Further draw backs in prior art gate arrays are discussed with respect to FIG. 1B. U.S. Pat. No. 6,476,493 discloses via M6M7 (via between metal 6 and metal 7) and a metal 7 (M7) customizable gate array. In FIG. 1B, a plurality of M5 121, M6 122 and M7 123 provide connectivity to the underlying fabric. They traverse orthogonal directions typical of gate arrays and ASICs. In a 7-layer metal ASIC, there are 7 metal layers and 6 via layers (not counting the contact layer between active/poly and metal 1) to offer interconnect customizability. When only via M6M7 and M7 are customizable, every programmable M1, M2, M3, M4 line/node (not shown) and every M5 line/node (shown) must be pre-coupled to a dedicated M6 node, line segment or line; which significantly reduces the M1-M5 programmable interconnect density. The reduction is non-linear, as every M1 node needs M2-M5 connectivity (thus further reducing available M2-M5 density for routing) to get up to M6. This is a first major draw-back—reduction in useful gate density due to lack of interconnect. The second major draw-back is the unpredictability of timing delay between two nodes. As an example, in an ASIC, interconnect delay between nodal pairs (131, 137) and (141, 147) are identical; they both possess the same distance. In a gate array, the tracks leading to M6 are pre-connected. In connecting nodes 131-137, tracks 132-136 must be chosen. The RC loading is not determined by distance; in this instance the length of each pre-assigned track 132-135 all add up to the delay. Furthermore, length of M7 to couple the two M6 nodes depends on how far apart the two M6 nodes are. In this example, connecting nodes 141-147 is done using metal tracks 142-146. As metal line lengths 132>142, 134>144, and 135>145, the timing delay is grossly mismatched between two identically distanced logic blocks. This disparity causes significant timing closure problems for computer automated tools, and requires expensive “RC” extraction tools and trial-and-error iterations.
The high NRE cost of designing an ASSP product gets amortized by the plurality of users using the IC, whereas one individual user may not be able to justify the design cost. With ASIC and SoC, a single user can justify the NRE costs due to projected large usage volume and return on investment; but such opportunities are rare to find. If the projected usage volume is not realized, the ASIC/SoC investment is a loss. ASIC and ASSP can be categorized as custom-wire technologies, whereas an FPGA can be categorized as generic-wire technology. In custom-wire technologies, the wire delays are highly optimized leading to low power and high performance; but at the cost of very lengthy design cycles, use of expensive tools and high mask NRE. In generic-wire technologies, the wires are pre-fabricated, and wire delays are very poor leading to high power and poor performance; but having the benefit of short design cycles, use of inexpensive tools and low NRE. Thus, low volume applications use FPGAs, high volume multi user applications use ASSP's and high volume single user applications use ASICs.
When a third party IC supplier designs an ASSP for a wide target audience, the final IC does not provide any advantage to a single user above and beyond another. The user has to buy the common ASSP device and design their end product to generate sufficient differentiation to compete in that market place. It is difficult to generate a high value differentiation on the final system using the same IC. FPGA and ASIC provide these differentiation. However, the very high NRE associated with ASICs and the very high unit price of FPGA makes that choice economically very difficult. Embedding an FPGA or Gate Array core within an ASIC/SoC provides programmability; however the gross inefficiencies of an FPGA/Gate Array fabrics is significant, and diminish the ASIC/SoC value.
For the reasons discussed, it is seen that improved configurable fabrics are highly desirable for ASIC/SoC products. Ease of design and productivity are equally important for design efficiency. Previously validated legacy cores (requiring no work) and new cores (requiring significant new work) must be handled through easy to use tools flows. These new cores need design debug, evolve, and often times change during the design cycle. While FPGA's offer this flexibility, it is an over-kill for the fully verified legacy blocks that are already characterized, and it's a poor substitute for the new cores due to poor performance and Silicon utilization.