The present invention relates to devices and software related to programmable logic devices (FPGA/MPGA) and application specific integrated circuits (ASSP/ASIC).
Traditionally, integrated circuit (IC) devices such as custom, semi-custom, or application specific integrated circuit (ASIC) devices have been used in electronic products to reduce cost, enhance performance or meet space constraints. However, the design and fabrication of custom or semi-custom ICs can be time consuming and expensive. The customization involves a lengthy design cycle during the product definition phase and high Non Recurring Engineering (NRE) costs during manufacturing phase. To absorb design modifications or in the event of finding a logic error in the custom or semi-custom IC during final test phase, the design and fabrication cycles may have to be repeated. Lengthy emulation and prototyping cycles further aggravate the time to market and NRE costs. As a result, ASICs serve only specific applications and are custom built for high volume and low cost.
Another type of semi custom device called a Gate Array (includes Platform ASIC and Structured ASIC) customizes modular blocks at a reduced NRE cost by synthesizing the design using a software model similar to the ASIC. Structured ASICs provide a larger modular block compared to Gate Arrays, and may or may not provide pre instituted clock networks to simplify the design effort. In both, a software tool has to undergo a tedious iteration between a trial placement and ensuing wire “RC” extraction for timing closure. In sub-micron process technologies, wire “RC” delays are very complex and difficult to predict. The missing silicon level design verification Gate Arrays result in multiple spins and lengthy design iterations, further exacerbating a quick design solution. Most users need the iterative tweaking of designs to perfect their design.
In recent years there has been a move away from custom or semi-custom ICs toward field programmable components whose function is determined not when the integrated circuit is fabricated, but by an end user “in the field” prior to use. Off the shelf, generic Programmable Logic Device (PLD) or Field Programmable Gate Array (FPGA) products greatly simplify the design cycle. These products offer user-friendly software to fit custom logic into the device through programmability, and the capability to tweak and optimize designs to improve silicon performance. As the wire “RC” delays are pre-characterized, the users are able to achieve complex placements and timing closures very quickly and very accurately. The flexibility of this programmability or alterability is expensive in terms of silicon real estate, but reduces design cycle and upfront NRE cost to the designer. In this disclosure the terms FPGA and PLD are used interchangeably to mean programmable devices.
FPGAs (includes PLDs) offer the advantages of low non-recurring engineering costs, fast turn around (designs can be placed and routed on an FPGA in typically a few minutes to few hours), and low risk since designs can be easily amended late in the product design cycle. It is only for high volume production runs that there is a cost benefit in using the more traditional ASIC approaches. Compared to PLD and FPGA, an ASIC has hard-wired logic connections, identified during the chip design phase. ASIC has no multiple logic choices, no multiple routing choices and no configuration memory to customize logic and routing. This is a large chip area and cost saving for the ASIC—the FPGA silicon area may be 10 to 40 times the ASIC area due to these programmable overheads. Smaller ASIC die sizes lead to better performance and better reliability. A full custom ASIC also has customized logic functions which may require fewer gates compared to PLD and FPGA implementations of the same logic functions. Thus, an ASIC is significantly smaller, faster, cheaper and more reliable than an equivalent gate-count FPGA. The trade-off is between time-to-market (FPGA advantage) versus low cost and better reliability (ASIC advantage). The cost of Silicon real estate for programmability provided by the FPGA compared to ASIC determines the extra cost the user has to bear for customer re-configurability of logic functions and routing between logic modules. Programmability includes configuration memory and MUX overhead in FPGAs.
The 10 to 40× silicon area disadvantage lead to significant cost and performance disparity between the ASIC and the FPGA. A significant portion of silicon real estate overhead is consumed by the programmable interconnects in an FPGA (including associated configuration memory). Removing routing to reduce silicon overhead makes an FPGA unusable. A 3D FPGA with better logic gate silicon density improvement over 2D FPGA has been disclosed in the IDS references, especially in application Ser. Nos. 10/267,483, 10/267,484 and 10/267,511. Such techniques may reduce the ratio of FPGA to ASIC logic gate silicon area ratio to 2 to 10 times. Reducing the FPGA logic area penalty improves the value of FPGA compared to ASIC. When the Si area ratio reaches a threshold (the threshold determined by the life-time volume needs of the device), it would eliminate the need for ASIC designs, and the FPGA design will become the new standard for system design.
A complex logic design is broken down to smaller logic blocks and programmed into logic elements or logic blocks provided in the FPGA. Logic elements offer sequential and combinational logic design implementations. Combinational logic has no memory and outputs reflect a function solely of present inputs. Sequential logic is implemented by inserting memory into the logic path to store past history. Current FPGA architectures include transistor pairs, NAND or OR gates, multiplexers, look-up-tables (LUTs) and AND-OR structures as a basic logic element. In a conventional FPGA, the basic logic element is labeled a macro-cell. Hereafter the terminology logic element will include logic elements, macro-cells, arithmetic logic units and any other basic logical unit used to implement a portion of a logic function. Granularity of a FPGA refers to logic content (small or large) of a basic logic element. The complex logic design is broken down to fit the custom FPGA grain. In fine-grain architectures, a small basic logic element is enclosed in a routing matrix and replicated. These offer easy logic fitting at the expense of complex routing. In course-grain architectures, many basic logic elements are wrapped with local routing into a logic block with larger functionality, which is then replicated. The logic block replication utilizes a global routing technique. Larger logic blocks make the logic fitting difficult and the routing easier. A challenge for FPGA architectures is to provide easy logic fitting (like fine-grain) and maintain easy routing (like course-grain).
Inputs and outputs for the Logic Element, Logic Unit or Logic Block are selected from the programmable Routing Matrix. A routing wire is dedicated to each. An exemplary routing matrix containing logic elements described in Ref-1 (Seals & Whapshott) is shown in FIG. 1. In that example, the inputs and outputs from Logic Element 101-104 are routed to 22 horizontal and 12 vertical interconnect wires with programmable via connections. These connections may be fuses, anti-fuses or SRAM controlled pass-gate transistors comprising a Connect state and a Disconnect state. One output of element 101 is shown coupled to one of the inputs to element 104 in darker lines: in that vertical wire #3 is used to complete the coupling. One output of element 103 is also shown coupled to one of the inputs to element 104 in darker lines: in that vertical wire #8 is used to complete the coupling. Thus every input and every output occupies one or more dedicated wires to complete the coupling. Thus the number wires, wire segments, programmable connection, and Si area required for the connectivity grows rapidly with the number of logic elements N within the fabric.
The logic element having a built in D-flip-flop used with FIG. 1 routing as described in Ref-1 is shown in FIG. 2. In that, elements 201, 202 and 203 are 2:1 MUX's controlled by one input signal each. Element 204 is an OR gate while 205 is a D-Flip-Flop. Without global Preset & Clear signals, eight inputs feed the logic block, and one output leaves the logic block. These 9 wires are shown in FIG. 1 with programmable connectivity. Thus 9 wires must be assigned to connect the logic element shown in FIG. 2. All 2-input, all 3-input and some 4-input variable functions are realized in the logic block and latched to the D-Flip-Flop. FPGA architectures for various commercially available devices are discussed in Ref-1 (Seals & Whapshott) as well as Ref-2 (Sharma). A comprehensive thesis on FPGA routing architecture is provides in Ref-3 (Betz, Rose & Marquardt) and Ref-4 (Lemieux & Lewis).
Routing block wire structure defines how logic blocks are connected to each other. Adjacent logic elements as well as die opposite corner logic elements may require connections. Wire signals are driven by output buffers attached to logic elements, and the drive strength does not change on account of wire length. Longer wires may require repeaters to rejuvenate the signals periodically. Buffers and repeaters consume a large Si area and are very expensive. The wire delays become unpredictable as the wire lengths are randomly chosen during the Logic Optimization to best fit the design into a given FPGA. FPGA's also incur lengthy run times during timing driven optimization of partitioned logic. As FPGA's grow bigger in die size, the number of wire segments and wire lengths to connect logic increase. Wire delays can dominate chip performance. Wire delays grow proportional to square of the wire length, and inverse distance to neighboring wires. Maximum chip sizes remain constant at mask dimension of about 2 cm per side, while metal wire spacing is reduced with technology scaling. A good timing optimization requires in depth knowledge of the specific FPGA fitter, the length of wires segments, and relevant process parameters; a skill not found within the design house doing the fitting. In segmented wire architectures, expensive fixed buffers are provided to drive global signals on selected lines. These buffers are too few as they are too expensive, and only offer unidirectional data flow. Predictable timing is another challenge for FPGA's. This would enhance place and route tool capability in FPGA's to better fit and optimize timing critical logic designs. More wires exacerbate the problem, while fewer wires keep the problem tractable, reducing FPGA cost.
Prior art FPGA architectures are discussed in detail in the IDS references cited in this Application. These patents disclose specialized routing blocks to connect logic elements in FPGA's and macro-cells in PLD's. In all IDS citations a fixed routing block is programmed to define inputs and outputs for the logic blocks, while the logic block performs a specific logic function. Such dedicated interconnect wires drive the cost of FPGAs over equivalent functionality ASICs. User specification to program the FPGA is held in FPGA configuration memory, which is coupled to logic in the FPGA. User specification to program a volatile FPGA is also duplicated in an external memory chip—however data from that memory chip is retrieved and loaded to on chip volatile configuration memory to configure the FPGA. Thus IDS cited FPGAs incur a huge penalty for on-chip configuration memory and MUXs that are needed for programmability. Some further require an expensive off-chip boot ROM to hold configuration data. Thus configuration memory expense is twice for SRAM based FPGAs.
Four methods of programming point to point connections, synonymous with programmable switches and programmable cross-bar points, between A and B are shown in FIG. 3. A configuration circuit to program the connection is not shown. All the patents listed in IDS use one or more of these basic connections to configure logic elements and programmable interconnects. The user implements the decision by programming a memory bit. This kind of configuration is different from a software instruction as the memory bit is physically generating a control signal to actively implement the decision. In FIG. 3A, a conductive fuse link 310 connects A to B. It is normally connected, and passage of a high current or a laser beam will blow the conductor open. In FIG. 3B, a capacitive anti-fuse element 320 disconnects A to B. It is normally open, and passage of a high current will pop the insulator to short the terminals. Fuse and anti-fuse are both one time programmable due to the non-reversible nature of the change. In FIG. 3C, a pass-gate device 330 connects A to B. The gate signal S0 determines the nature of the connection, on or off. This is a non destructive change. The gate signal is generated by manipulating logic signals, or by configuration circuits that include memory. The choice of memory varies from user to user. In FIG. 3D, a floating-pass-gate device 340 connects A to B. Control gate signal S0 couples a portion of that to floating gate. Electrons trapped in the floating gate determines on or off state of the connection. Hot-electrons and Fowler-Nordheim tunneling are two mechanisms to inject charge onto floating-gates. When high quality insulators encapsulate the floating gate, trapped charge stays for over 10 years. These provide non-volatile memory. EPROM, EEPROM and Flash memory employ floating-gates and are non-volatile. Anti-fuse and SRAM based architectures are widely used in commercial FPGA's, while EPROM, EEPROM, anti-fuse and fuse links are widely used in commercial PLD's. Volatile SRAM memory needs no high programming voltages, is freely available in every logic process, is compatible with standard CMOS SRAM memory, lends to process and voltage scaling and has become the de-facto choice for modern very large FPGA devices. Unfortunately they need an external expensive boot-ROM to save configuration data.
A volatile six transistor SRAM based configuration circuit is shown in FIG. 4A. The SRAM memory element can be any one of 6-transistor, 5-transistor, full CMOS, R-load or TFT PMOS load based cells to name a few. Two inverters 403 and 404 connected back to back forms the memory element. This memory element is a latch. The latch can be full CMOS, R-load, PMOS load or any other. Power and ground terminals for the inverters are not shown in FIG. 4A. Access NMOS transistors 401 and 402, and access wires GA, GB, BL and BS provide the means to configure the memory element. Applying zero and one on BL and BS respectively, and raising GA and GB high enables writing zero into device 401 and one into device 402. The output S0 delivers a logic one. Applying one and zero on BL and BS respectively, and raising GA and GB high enables writing one into device 401 and zero into device 402. The output S0 delivers a logic zero. The SRAM construction may allow applying only a zero signal at BL or BS to write data into the latch. The SRAM cell may have only one access transistor 401 or 402.
The SRAM latch will hold the data state as long as power is on. When the power is turned off, the SRAM bit needs to be restored to its previous state from an outside permanent memory (ROM). The outside memory is not coupled to programmable logic to configure the logic, and the data retrieval is identical to microprocessors retrieving external DRAM memory data to store and use in local cache. In the literature for programmable logic, this second non-volatile memory is also called configuration memory, and should not be confused with the applicant's definition of configuration memory that is coupled to programmable logic.
The SRAM configuration circuit in FIG. 4A controlling logic pass-gate as shown in FIG. 3C is illustrated in FIG. 4B. Element 450 represents the configuration circuit. The S0 output directly driven by the memory element in FIG. 4A drives the pass-gate gate electrode. In addition to S0 output and the latch, power, ground, data in and write enable signals in 450 constitutes the SRAM configuration circuit. Write enable circuitry includes GA, GB, BL, BS signals shown in FIG. 4A. An SRAM based switch is shown in FIG. 4B, where pass-gate 410 can be a PMOS, NMOS, or CMOS transistor pair. NMOS is preferred due to its higher conduction. The gate voltage S0 on NMOS transistor 410 gate electrode determines an ON or OFF connection: S0 having a logic level one completes the point to point connection, while a logic level zero keeps the nodes disconnected. That logic level is generated by a configuration circuit 450 coupled to the gate of NMOS transistor 410. The symbol used for the programmable switch comprising the SRAM device and the pass-gate is shown in FIG. 4C as the cross-hatched circle 460. SRAM memory data can be changed anytime in the operation of the device, altering an application and routing on the fly, thus giving rise to the concept of reconfigurable computing in FPGA devices.
A programmable MUX utilizes a plurality of point to point switches. FIG. 5 shows three different MUX based programmable logic constructions. FIG. 5A shows a programmable 2:1 MUX. In the MUX, two pass-gates 511 and 512 allow two inputs I0 and I1 to be connected to output O. A configuration circuit 550 having two complementary output control signals S0 and S0′ provides the programmability. When S0=1, S0′=0; I0 is coupled to O. When S0=0, S0′=1; I1 is coupled to O. With one memory element inside 550, one input is always coupled to the output. If two bits were provided inside 550, two mutually exclusive outputs S0 and S1 could be generated. That would allow neither I0 nor I1 to be coupled to O, if such a requirement exists in the logic design. FIG. 5B shows a programmable 4:1 MUX controlled by 2 memory elements. A similar construction when the 4 inputs I0 to I3 are replaced by 4 memory element outputs S0 to S3, and the pass-gates are controlled by two inputs I0 & I1 is called a 4-input look up table (LUT). The 4:1 MUX in FIG. 5B operate with two memory elements 561 and 562 contained in the configuration circuit 560 (not shown). Similar to FIG. 5A, one of I0, I1, I2 or I3 is connected to O depending on the S0 and S1 states. For example, when S0=1, S1=1, I0 is coupled to O. Similarly, when S0=0 and S1=0, I3 is coupled to O. A 3 bit programmable 3:1 MUX is shown in FIG. 5C. Point D can be connected to A, B or C via pass-gates 531, 533 or 532 respectively. Memory elements 571, 572 and 573 contained in a configuration circuit 570 (not shown) control these pass-gate input signals. Three memory elements are required to connect D to just one, any two or all three points. In reconfigurable computing, data in memory elements 571, 572 and 573 can be changed on the fly to alter connectivity between A, B, C and D as desired.
In the IDS reference citations, three dimensional concepts to construct building blocks in 3D FPGAs are disclosed. In a first aspect, 3D FPGA's reduce silicon area by positioning configuration memory above the programmable logic content. In a second aspect, an expensive user programmable RAM memory is first used to target a complex design into a programmable device, and when the design is frozen, the RAM is replaced by an inexpensive mask programmable ROM memory. In a third aspect, a thin film transistor comprising majority carrier conduction is used to construct 3-dimensional configuration circuits. Thin film SRAM memory has better alpha particle immunity over bulk SRAM. In a fourth aspect, a 3-dimensional thin-film transistor SRAM memory element is used to program programmable logic. In a fifth aspect MUXs are stacked over logic and configuration memory is stacked over MUXs to significantly reduce Silicon footprint. One or more of the disclosures, used individually or in conjunction with other disclosures demonstrate a significant improvement to 3D programmable logic devices over conventional 2D programmable logic devices.