The present invention relates to programmable structured arrays for semiconductor integrated circuits.
Traditionally, application specific integrated circuit (ASIC) devices have been used in the integrated circuit (IC) industry to reduce cost, enhance performance or meet space constraints. The generic class of ASIC devices falls under a variety of sub classes such as Custom ASIC, Standard cell ASIC, Gate Array and Field Programmable Gate Array (FPGA) where the degree of user allowed customization varies. In this disclosure the word ASIC is used only in reference to Custom and Standard Cell ASICs, and reference to remaining ICs such as FPGA and Gate Arrays will be by their sub-classification. The devices FPGA include Programmable Logic Devices (PLD) and Complex Programmable Logic Devices (CPLD), while the devices Gate Array include Laser Programmable Gate Arrays (LPGA), Mask Programmable Gate Arrays (MPGA) and a new class of devices known as Structured ASIC or Structured Arrays.
The design and fabrication of ASICs can be time consuming and expensive. The customization involves a lengthy design cycle during the product definition phase and high Non Recurring Engineering (NRE) costs during manufacturing phase. In the event of finding a logic error in the custom or semi-custom ASIC during final test phase, the design and fabrication cycle has to be repeated. Such lengthy correction cycles further aggravate the time to market and engineering cost. As a result, ASICs serve only specific applications and are custom built for high volume and low cost. The high cost of masks and unpredictable device life time shipment volumes have caused ASIC design starts to fall precipitously in the IC industry. ASICs offer no device for off the shelf verification, no user customization capability and requires a full custom mask set for fabrication.
Gate Array customizes pre-defined modular blocks at a reduced NRE cost by synthesizing the module connections with a software model similar to the ASIC. The Gate Array has an array of non programmable functional modules fabricated on a semiconductor substrate. To interconnect these modules to a user specification, multiple layers of wires are used during a synthesis process. The level of customization may be limited to a single metal layer, or single via layer, or multiple metal layers, or multiple metals and via layers. The goal is to reduce the customization cost to the user, and provide the customized product faster. As a result, the customizable layers are designed to be the top most metal and via layers of a semiconductor fabrication process. This is an inconvenient location to customize wires. The customized transistors are located at the substrate level of the Silicon. All possible connections have to come up to the top level metal. The complexity of bringing up connections is a severe constraint for these devices. Structured ASICs fall into larger module Gate Arrays. These devices discussed in Or-Bach U.S. Pat. No. 6,331,789, How et al. U.S. Pat. Nos. 6,242,767, 6,613,611 have varying degrees of complexity in the structured cell and varying degrees of complexity in the custom interconnection. The absence of silicon for design verification and design optimization results in multiple spins and lengthy design iterations to the end user. The Gate Array evaluation phase is no different to that of an ASIC. The advantage over ASIC is in a lower upfront NRE cost for the fewer customization layers, tools and labor. Gate Arrays offer no device for off the shelf verification, metallization based user customization during synthesis and requires a partial custom mask set for fabrication.
In recent years there has been a move away from custom, semi-custom and Gate Array ICs toward field programmable components whose function is determined not when the integrated circuit is fabricated, but by an end user “in the field” prior to use. Off the shelf FPGA products greatly simplify the design cycle and are fully customized by the user. These products offer user-friendly software to fit custom logic into the device through programmability, and the capability to tweak and optimize designs to improve silicon performance. Provision of this programmability is expensive in terms of silicon real estate, but reduces design cycle time, time to solution (TTS) and upfront NRE cost to the designer. FPGAs offer the advantages of low NRE costs, fast turnaround (designs can be placed and routed on an FPGA in typically a few minutes), and low risk since designs can be easily amended late in the product design cycle. It is only for high volume production runs that there is a cost benefit in using the other two approaches. Compared to FPGA, an ASIC and Gate Array both have hard-wired logic connections, identified during the chip design phase. ASIC has no multiple logic choices and both ASIC and Gate Arrays have no configuration memory to customize logic. This is a large chip area and a product cost saving for these approach to design. Smaller die sizes also lead to better performance. A full custom ASIC has customized logic functions which take less gate counts compared to Gate Arrays, PLD and FPGA configurations of the same functions. Thus, an ASIC is significantly smaller, faster, cheaper and more reliable than an equivalent gate-count PLD or FPGA. A Gate Array is also smaller, faster and cheaper compared to an equivalent FPGA. The trade-off is between time-to-market (PLD and FPGA advantage) versus low cost and better reliability (ASIC advantage). A Gate Array falls in the middle with an improvement in the ASIC NRE cost at a moderate penalty to product cost and performance. The cost of Silicon real estate for programmability provided by the PLD and FPGA compared to ASIC and Gate Array contribute to a significant portion of the extra cost the user has to bear for customer re-configurability in logic functions.
In a PLD and an FPGA, a complex logic design is broken down to smaller logic blocks and programmed into logic blocks provided in the FPGA. Logic blocks contain multiple smaller logic elements. Logic elements facilitates sequential and combinational logic design implementations. Combinational logic has no memory and outputs reflect a function solely of present inputs. Sequential logic is implemented by inserting memory into the logic path to store past history. Current PLD and FPGA architectures include transistor pairs, NAND or OR gates, multiplexers, look-up-tables (LUTs) and AND-OR structures in a basic logic element. In a PLD the basic logic element is labeled a macro-cell. Hereafter the terminology FPGA will include both FPGAs and PLDs, and the terminology logic element will include both logic elements and macro-cells. Granularity of an FPGA refers to logic content of the basic logic element. Smaller blocks of a complex logic design are customized to fit into FPGA grain. In fine-grain architectures, a small basic logic element is enclosed in a routing matrix and replicated. These offer easy logic fitting at the expense of complex routing. In course-grain architectures, many basic logic elements are combined with local routing and wrapped in a routing matrix to form a large logic block. The larger logic block is then replicated with global routing. Larger logic blocks make the logic fitting difficult and the routing easier. A challenge for FPGA architectures is to provide easy logic fitting (like fine-grain) and maintain easy routing (like course-grain).
A logic element used in Gate Arrays is called a structured cell or a module. These cells can also include transistor pairs, NAND or OR gates, MUXs and LUTs. To include sequential logic designs, the structured cell may also include flip-flops. An exemplary logic element, or a structured cell, or a module, described in Ref-1 (Seals & Whapshott) is shown in FIG. 1A. The logic element has a built in D-flip-flop 105 for sequential logic implementation. In addition, elements 101, 102 and 103 are 2:1 MUX's controlled by one input signal for each MUX. Input S1 feeds into 101 and 102, while inputs S1 and S2 feeds into OR gate 104, and the output from OR gate feeds into 103. Element 105 is the D-Flip-Flop receiving Preset, Clear and Clock signals. Ignoring the global Preset & Clear signals, eight inputs feed the logic block, and one output leaves the logic block. All two-input, most 2-input and some 3-input variable functions are realized in the logic block and latched to the D-Flip-Flop. Inputs and outputs for the Logic Element or Logic Block are selected from the programmable Routing Matrix. An exemplary routing matrix containing logic elements as described in Ref-1 is shown in FIG. 1B. Each logic element 112 is as shown in FIG. 1A. The 8 inputs and 1 output from logic element 112 in FIG. 1B are routed to 22 horizontal and 12 vertical interconnect wires that have programmable via connections 110. These connections 110 may be anti-fuses or pass-gate transistors controlled by SRAM memory elements. The user selects how the wires are connected during the design phase, and programs the connections in the field. FPGA architectures for various commercially available devices are discussed in Ref-1 (Seals & Whapshott) as well as Ref-2 (Sharma). A comprehensive thesis on FPGA routing architecture is provided in Ref-3 (Betz, Rose & Marquardt).
A Gate Array routing matrix is shown in FIG. 1C with the same logic element 122 as shown in FIG. 1A. The 8 inputs and 1 output of logic element 122 in FIG. 1C are hard wired into an array of logic and a multiplicity of potential connections are brought up to a lower metal layer, below the customizable metal layer. Two orthogonal metal layers are shown in FIG. 1C with either one being the top metal (say dotted lines) and the other the metal below the top metal (say solid lines). The top metal mask has to be customized to complete all the logic connections as specified by the design to connect the plurality of logic elements 122. This is achieved by laying top metal in the pre-defined tracks and connecting them to lower metal at via connections 121 accordingly. Pads are similarly connected to top metal as assigned in the design. Circle 121 represents a pre-defined top metal connection to a lower metal. To account for inefficiency in track utilization, excess wires at a higher chip area compared to an ASIC must be provided. The X-Y connection matrix may be completed by a single custom mask of the top metal in theory, but a multi-metal customization is more practical to achieve. Solid lines and via connections in FIG. 1C pre-exist and do not change during the one mask customization. Inputs and outputs of logic elements 122 are connected to synthesized dotted metal lines and customized in top metal to complete interconnection. Clock skews and inefficient utilization of metal tracks complicates the design and increases the NRE cost. Access to all metal layers for the customization makes synthesis and fixing clock skews easier at the expense of higher mask costs and longer fabrication delay. Some commercial Gate Array solutions offer four metal layers to customize the interconnect as it is difficult to get all the possible logic element connections into top metal layer.
FPGA architectures are discussed in Hartmann U.S. Pat. No. 4,609,986, Carter U.S. Pat. No. 4,706,216, Turner et al. U.S. Pat. No. 4,761,768, Freemann U.S. Pat. No. 4,870,302, ElGamal et al. U.S. Pat. No. 4,873,459, Freemann et al. U.S. Pat. Nos. 5,488,316 & 5,343,406, Tsui et al. U.S. Pat. No. 5,835,405, Trimberger et al. U.S. Pat. No. 5,844,422, Cliff et al. U.S. Pat. No. 6,134,173, Mendel U.S. Pat. No. 6,275,065, Young et al. U.S. Pat. No. 6,448,808, and Sugibayashi et al. U.S. Pat. No. 6,515,511. These patents disclose specialized routing blocks to connect logic elements in FPGA's and macro-cells in PLD's. In all cases the routing block is programmed to define inputs and outputs for the logic blocks, while the logic block is programmed to perform a specific logic function.
Four exemplary methods of programmable point to point connections synonymous with programmable switches, between node A and node B are shown in FIG. 2. These form connections 110 in FIG. 1B where node A is located in a first wire and node B is located in a second wire. A configuration circuit to program the connection is not shown in FIG. 2. All the patents listed under FPGA architecture use one or more of these basic programmable connections. In FIG. 2A, a conductive fuse link 210 connects A to B. It is normally connected, and passage of a high current or exposure to a laser beam will blow the conductor open. In FIG. 2B, a capacitive anti-fuse element 220 disconnects A to B. It is normally open, and passage of a high current will pop the insulator shorting the two terminals. Fuse and anti-fuse are both one time programmable due to the non-reversible nature of the change. In FIG. 2C, a pass-gate device 230 connects A to B. The gate signal S0 determines the nature of the connection, on or off. This is a non destructive change. The gate signal is generated by manipulating logic signals, or by configuration circuits that include memory. The choice of memory varies from user to user. In FIG. 2D, a floating-pass-gate device 240 connects A to B. Control gate signal S0 couples a portion of that to floating gate. Electrons trapped in the floating gate determines an on or off state for the connection. Hot-electrons and Fowler-Nordheim tunneling are two mechanisms for injecting charge to floating-gates. When high quality insulators encapsulate the floating gate, trapped charge stays for over 10 years. These provide non-volatile memory. EPROM, EEPROM and Flash memory employ floating-gates and are non-volatile. Anti-fuse and SRAM based architectures are widely used in commercial FPGA's, while EPROM, EEPROM, anti-fuse and fuse links are widely used in commercial PLD's. Volatile SRAM memory needs no high programming voltages, is freely available in every logic process, is compatible with standard CMOS SRAM memory, lends to process and voltage scaling and has become the de-facto choice for modem day very large FPGA device construction.
A volatile six transistor SRAM based configuration circuit is shown in FIG. 3A. The SRAM memory element can be any one of 6-transistor, 5-transistor, full CMOS, R-load or TFT PMOS load based cells to name a few. Two inverters 303 and 304 connected back to back forms the memory element. This memory element is a latch. The latch can be constructed as full CMOS, R-load, PMOS load or any other. Power and ground terminals for the inverters are not shown in FIG. 3A. Access NMOS transistors 301 and 302, and access wires GA, GB, BL and BS provide the means to configure the memory element. Applying zero and one on BL and BS respectively, and raising GA and GB high enables writing zero into device 301 and one into device 302. The output S0 delivers a logic one. Applying one and zero on BL and BS respectively, and raising GA and GB high enables writing one into device 301 and zero into device 302. The output So delivers a logic zero. The SRAM construction may allow applying only a zero signal at BL or BS to write data into the latch. The SRAM cell may have only one access transistor 301 or 302. The SRAM latch will hold the data state as long as power is on. When the power is turned off, the SRAM bit needs to be restored to its previous state from an outside permanent memory. In the literature for programmable logic, this second non-volatile memory is also called configuration memory. Upon power up, an external or an internal CPU loads the external configuration memory to internal configuration memory locations. All of FPGA functionality is controlled by the internal configuration memory. The SRAM configuration circuit in FIG. 3A controlling logic pass-gate is illustrated in FIG. 3B. Element 350 represents the configuration circuit. The S0 output directly driven by the memory element in FIG. 3A drives the pass-gate 310 gate electrode. In addition to S0 output and the memory cell, power, ground, data in and write enable signals in 350 constitutes the SRAM configuration circuit. Write enable circuitry includes GA, GB, BL, BS signals shown in FIG. 3A.
Structured ASIC described in U.S. Pat. No. 6,331,789 contains SRAM based 3-input LUTs to enhance logic flexibility similar to FPGAs described in U.S. Pat. Nos. 4,706,216, 4,870,302, 5,488,316, 5,343,406, 5,844,422 and 6,134,173. LUT programmability at silicon substrate level may reduce the number of wires required to connect all the modules at an upper metal layer. Packing logic into 3-input or 4-input pre-fabricated LUTs is fairly inefficient and costly compared to the logic element shown in FIG. 1A. Clock skew and track inefficiencies are still encountered during simulation and difficult to fix with these devices. Once the metal is hard-wired to a suitable logic placement, the structured ASIC is very inflexible to design tweaks and changes. Module function, module placements and wire connections all change during a timing or cost driven optimization of a design. When the wires are fixed, there is no method to change the module placement in the fixed module locations. Thus programmable LUT based structured cells add little value over hard-wire structured cells shown in FIG. 1. They both provide no off-the-shelf emulation device, like an FPGA, where the customer can change and tweak a design in real Silicon. Such an emulation device could be plugged into a system debug board and further used for early design wins and provided to customers as first samples.
What is desirable is to have programmable version to a structured ASIC device at the beginning of a design cycle. The user can program such an off-the-shelf device, place logic and routing at an optimal location to improve timing or cost of said design. The flexibility is further enhanced when the logic element contains programmable elements such as LUTs. For an emulation device, the cost of programmability is not a concern if such a device lends to easy design porting to a hard-wire low cost version once the design is finalized. Such a conversion has to keep the timing of the original design intact to avoid valuable re-engineering time and cost. Such a conversion should lower the end product cost to be competitive with an equivalent standard cell ASIC cost for design opportunities that forecast fairly significant volumes. These programmable structured ASIC devices will target applications that are cost sensitive, have short life cycles and demand volumes larger than for typical FPGA designs and lower than for typical ASIC designs.