The present invention relates to programmable interconnect structures. Specifically it relates to area efficient bidirectional buffers used to efficiently route signals in programmable logic devices.
Traditionally, integrated circuit (IC) devices such as custom, semi-custom, or application specific integrated circuit (ASIC) devices have been used in electronic products to reduce cost, enhance performance or meet space constraints. However, the design and fabrication of custom or semi-custom ICs can be time consuming and expensive. The customization involves a lengthy design cycle during the product definition phase and high Non Recurring Engineering (NRE) costs during manufacturing phase. In the event of finding a logic error in the custom or semi-custom IC during final test phase, the design and fabrication cycle has to be repeated. Such lengthy correction cycles further aggravate the time to market and engineering cost. As a result, ASICs serve only specific applications and are custom built for high volume and low cost.
Another type of semi custom device called a Gate Array customizes modular blocks at a reduced NRE cost by synthesizing the design using a software model similar to the ASIC. The missing silicon level design verification results in multiple spins and lengthy design iterations. Structured ASICs come under larger module Gate Arrays.
In recent years there has been a move away from custom or semi-custom ICs toward field programmable components whose function is determined not when the integrated circuit is fabricated, but by an end user “in the field” prior to use. Off the shelf, generic Programmable Logic Device (PLD) or Field Programmable Gate Array (FPGA) products greatly simplify the design cycle. These products offer user-friendly software to fit custom logic into the device through programmability, and the capability to tweak and optimize designs to improve silicon performance. The flexibility of this programmability is expensive in terms of silicon real estate, but reduces design cycle and upfront NRE cost to the designer.
FPGAs offer the advantages of low non-recurring engineering costs, fast turnaround (designs can be placed and routed on an FPGA in typically a few minutes), and low risk since designs can be easily amended late in the product design cycle. It is only for high volume production runs that there is a cost benefit in using the more traditional approaches. Compared to PLD and FPGA, an ASIC has hard-wired logic connections, identified during the chip design phase. ASIC has no multiple logic choices and no configuration memory to customize logic. This is a large chip area and cost saving for the ASIC. Smaller ASIC die sizes lead to better performance. A full custom ASIC also has customized logic functions which take less gate counts compared to PLD and FPGA configurations of the same functions. Thus, an ASIC is significantly smaller, faster, cheaper and more reliable than an equivalent gate-count PLD or FPGA. The trade-off is between time-to-market (PLD and FPGA advantage) versus low cost and better reliability (ASIC advantage). The cost of Silicon real estate for programmability provided by the PLD and FPGA compared to ASIC determines the extra cost the user has to bear for customer re-configurability of logic functions.
In a PLD and an FPGA, a complex logic design is broken down to smaller logic blocks and programmed into logic blocks provided in the FPGA. Smaller logic elements allow sequential and combinational logic design implementations. Combinational logic has no memory and outputs reflect a function solely of present inputs. Sequential logic is implemented by inserting memory into the logic path to store past history. Current PLD and FPGA architectures include transistor pairs, NAND or OR gates, multiplexers, look-up-tables (LUTs) and AND-OR structures in a basic logic element. In a PLD the basic logic element is labeled as macro-cell. Hereafter the terminology FPGA will include both FPGAs and PLDs, and the terminology logic element will include both logic elements and macro-cells. Granularity of a FPGA refers to logic content of a basic logic element. Smaller blocks of a complex logic design are customized to fit into FPGA grain. In fine-grain architectures, a small basic logic element is enclosed in a routing matrix and replicated. These offer easy logic fitting at the expense of complex routing. In course-grain architectures, many basic logic elements are combined with local routing and wrapped in a routing matrix to form a logic block. The logic block is then replicated with global routing. Larger logic blocks make the logic fitting difficult and the routing easier. A challenge for FPGA architectures is to provide easy logic fitting (like fine-grain) and maintain easy routing (like course-grain).
Inputs and outputs for the Logic Element or Logic Block are selected from the programmable Routing Matrix. An exemplary routing matrix containing logic elements described in Ref-1 (Seals & Whapshott) is shown in FIG. 1. In that example, the inputs and outputs from Logic Element are routed to 22 horizontal and 12 vertical interconnect wires with programmable via connections. These connections may be anti-fuses or pass-gate transistors controlled by SRAM memory elements. The logic element having a built in D-flip-flop used with FIG. 1 routing as described in Ref-1 is shown in FIG. 2. In that, elements 201, 202 and 203 are 2:1 MUX's controlled by one input signal each. Element 204 is an OR gate while 205 is a D-Flip-Flop. Without global Preset & Clear signals, eight inputs feed the logic block, and one output leaves the logic block. These 9 wires are shown in FIG. 1 with programmable connectivity. All two-input, most 2-input and some 3-input variable functions are realized in the logic block and latched to the D-Flip-Flop. FPGA architectures for various commercially available devices are discussed in Ref-1 (Seals & Whapshott) as well as Ref-2 (Sharma). A comprehensive thesis on FPGA routing architecture is provides in Ref-3 (Betz, Rose & Marquardt) and Ref-4 (Lemieux & Lewis).
Routing block wire structure defines how logic blocks are connected to each other. Neighboring logic elements have short wire connections, while die opposite corner logic blocks have long wire connections. All wires are driven by a fixed pre-designed logic element output buffer and the drive does not change on account of wire length. The wire delays become unpredictable as the wire lengths are randomly chosen during the Logic Optimization to best fit the design into a given FPGA. FPGA's also incur lengthy run times during timing driven optimization of partitioned logic. As FPGA's grow bigger in die size, the wire lengths increase and wire delays dominate chip performance. Wire delays grow proportional to square of the wire length, and inverse distance to neighboring wires. Chip sizes remain constant at mask dimension of about 2 cm per side, while metal wire spacing is reduced with technology scaling. A good timing optimization requires in depth knowledge of the specific FPGA fitter, the length of wires segments, and relevant process parameters; a skill not found within the design house doing the fitting. In segmented wire architectures, fixed buffers are provided to drive global signals on selected lines. These buffers are too few, too expensive, and only offer unidirectional data flow. Predictable timing is another challenge for FPGA's. This would enhance place and route tool capability in FPGA's to better fit and optimize timing critical logic designs.
FPGA architectures are discussed in U.S. Pat. Nos. 4,609,986, 4,706,216, 4,761,768, 4,783,763, 4,870,302, 4,873,459, 5,343,406, 5,488,316, 5,739,713, 5,835,405, 5,844,422, 6,134,173, 6,137,308, 6,239,613, 6,275,065, 6,331,789, 6,448,808, 6,515,511, 6,630,842, 6,747,482, 6,781,408, 6,812,737 and US Publication Numbers 2002/0186044 and 2003/0085733. These patents disclose specialized routing blocks to connect logic elements in FPGA's and macro-cells in PLD's. In all cases the routing block is programmed to define inputs and outputs for the logic blocks, while the logic block performs a specific logic function.
Four methods of programmable point to point connections, synonymous with programmable switches, between A and B are shown in FIG. 3. A circuit to program the connection is not shown. All the patents listed above use one or more of these basic connections. In FIG. 3A, a conductive fuse link 310 connects A to B. It is normally connected, and passage of a high current or a laser beam will blow the conductor open. In FIG. 3B, a capacitive anti-fuse element 320 disconnects A to B. It is normally open, and passage of a high current will pop the insulator to short the terminals. Fuse and anti-fuse are both one time programmable due to the non-reversible nature of the change. In FIG. 3C, a pass-gate device 330 connects A to B. The gate signal S0 determines the nature of the connection, on or off. This is a non destructive change. The gate signal is generated by manipulating logic signals, or by configuration circuits that include memory. The choice of memory varies from user to user. In FIG. 3D, a floating-pass-gate device 340 connects A to B. Control gate signal S0 couples a portion of that to floating gate. Electrons trapped in the floating gate determines on or off state of the connection. Hot-electrons and Fowler-Nordheim tunneling are two mechanisms to inject charge onto floating-gates. When high quality insulators encapsulate the floating gate, trapped charge stays for over 10 years. These provide non-volatile memory. EPROM, EEPROM and Flash memory employ floating-gates and are non-volatile. Anti-fuse and SRAM based architectures are widely used in commercial FPGA's, while EPROM, EEPROM, anti-fuse and fuse links are widely used in commercial PLD's. Volatile SRAM memory needs no high programming voltages, is freely available in every logic process, is compatible with standard CMOS SRAM memory, lends to process and voltage scaling and has become the de-facto choice for modern very large FPGA devices.
A volatile six transistor SRAM based configuration circuit is shown in FIG. 4A. The SRAM memory element can be any one of 6-transistor, 5-transistor, full CMOS, R-load or TFT PMOS load based cells to name a few. Two inverters 403 and 404 connected back to back forms the memory element. This memory element is a latch. The latch can be full CMOS, R-load, PMOS load or any other. Power and ground terminals for the inverters are not shown in FIG. 4A. Access NMOS transistors 401 and 402, and access wires GA, GB, BL and BS provide the means to configure the memory element. Applying zero and one on BL and BS respectively, and raising GA and GB high enables writing zero into device 401 and one into device 402. The output S0 delivers a logic one. Applying one and zero on BL and BS respectively, and raising GA and GB high enables writing one into device 401 and zero into device 402. The output S0 delivers a logic zero. The SRAM construction may allow applying only a zero signal at BL or BS to write data into the latch. The SRAM cell may have only one access transistor 401 or 402. The SRAM latch will hold the data state as long as power is on. When the power is turned off, the SRAM bit needs to be restored to its previous state from an outside permanent memory. In the literature for programmable logic, this second non-volatile memory is also called configuration memory. The SRAM configuration circuit in FIG. 4A controlling logic pass-gate as shown in FIG. 3C is illustrated in FIG. 4Ba. Element 450 represents the configuration circuit. The S0 output directly driven by the memory element in FIG. 4A drives the pass-gate electrode. In addition to S0 output and the latch, power, ground, data in and write enable signals in 450 constitutes the SRAM configuration circuit. Write enable circuitry includes GA, GB, BL, BS signals shown in FIG. 4A. The symbol used for the programmable switch comprising the SRAM device and the pass-gate is shown in FIG. 4Bb as the cross-hatched circle 460.
A programmable MUX utilizes a plurality of point to point switches. FIG. 5 shows three different MUX based programmable logic constructions. FIG. 5A shows a programmable 2:1 MUX. In the MUX, two pass-gates 511 and 512 allow two inputs I0 and I1 to be connected to output O. A configuration circuit 550 having two complementary output control signals S0 and S0′ provides the programmability. When S0=1, S0′=0; I0 is coupled to O. When S0=0, S0′=1; I1 is coupled to O. With one memory element inside 550, one input is always coupled to the output. If two bits were provided inside 550, two mutually exclusive outputs S0 and S1 could be generated. That would allow neither I0 nor I1 to be coupled to O, if such a requirement exists in the logic design. FIG. 5B shows a programmable 4:1 MUX controlled by 2 memory elements. A similar construction when the 4 inputs I0 to I3 are replaced by 4 memory element outputs S0 to S3, and the pass-gates are controlled by two inputs I0 & I1 is called a 4-input look up table (LUT). The 4:1 MUX in FIG. 5B operate with two memory elements 561 and 562 contained in the configuration circuit 560 (not shown). Similar to FIG. 5A, one of I0, I1, I2 or I3 is connected to O depending on the S0 and S1 states. For example, when S0=1, S1=1, I0 is coupled to O. Similarly, when S0=0 and S1=0, I3 is coupled to O. A 3 bit programmable 3:1 MUX is shown in FIG. 5C. Point D can be connected to A, B or C via pass-gates 531, 533 or 532 respectively. Memory elements 571, 573 and 572 contained in a configuration circuit 570 (not shown) control these pass-gate input signals. Three memory elements are required to connect D to just one, any two or all three points.
FPGA and ASICs require buffers to improve signal propagation delay in long wires. This is shown in FIG. 6A, where the incoming signal at point A in the wire is buffered by inverter 610 and 620 in series. The two inverters are sized appropriately to drive a long segment of wire starting at B node of the wire. The buffer may drive more than one wire. A programmable bi-directional buffer from U.S. Pat. No. 4,870,302 shown in FIG. 6Ba has two such back-to-back buffers gated by two pass-gate logic elements 630 and 640. Unlike the full CMOS signal drive at point B in FIG. 6A, the buffers in FIG. 6Ba has many draw backs: (i) the area requirement for two back to back buffers, (ii) threshold voltage (Vt) drop in passing voltage power (Vcc) level, (iii) boosted pass-gate signal level over Vcc if not to lose Vt drop, (iv) larger area CMOS pass-gate if not to lose Vt drop, (v) pass-gate ON resistance impacting signal delay and (vi) very wide width of pass-gate (hence large area) to minimize ON resistance. The symbol used in this disclosure for the dual buffer structure in FIG. 6Ba is shown in FIG. 6Bb, wherein two back to back elements 645 are shown. Each element 645 represents the buffer and the pass-gate controlled by the SRAM device show in FIG. 6Ba Either a single SRAM bit or two SRAM bits may be used in FIG. 6Bb to control the two buffers. With one bit control, as show in FIG. 6Ba, one of the paths in the buffer is always activated. With two SRAM bit controls, both buffers can be de-activated to tri-state the wires. The two buffers consume a very large Si area due to the very wide width of the transistors needed to drive data quickly. Often times, uni-directional wires with single buffers are provided in FPGAs (that have hundreds of thousands of wires) to reduce the cost associated with adding dual buffers on every wire. That restriction is counter productive for the software tools that provide routing for randomly placed logic blocks, as each wire has a predefined direction for data flow, and routing choices are restricted. An inexpensive programmable buffer to eliminate these draw backs is highly desirable for FPGA's. None of the prior teachings demonstrate how to implement programmable buffers to overcome these deficiencies.
A useful measure of a programmable circuit is the gate comparison to an equivalent application specific circuit. SRAM based programmable pass-gates have to absorb the transistor overhead in the SRAM memory element. This can be easily seen in a 4 point switch in FIG. 6C discussed in Ref-3 (Betz, Rose & Marquardt) and U.S. Pat. No. 4,870,302. The switch in FIG. 6C is a simple extension of the 3:1 MUX for 4 points. An ASIC will connect two points with a direct connection inside the circle. This programmable alternative has 6 wide pass-gate devices (such as 652) and 6 SRAM devices (such as 651). The SRAM (similar to FIG. 4A) overhead is 36 transistors, while the pass-gate overhead is 6 transistors. Such an overhead is extremely uneconomical for modern FPGA's that require some level of reasonable cost parity to an ASIC. In most programmable devices, after the user has finalized the logic design, it is rarely or never changed. For such designs, a conversion from programmable to application specific is highly desirable. The referenced usages do not lend to an easy economical conversion.
FPGAs are comprised of bundles of wires spanning in X and Y directions of the FPGA device, each bundle connecting pre-arranged programmable logic blocks. The wires are often times segmented to be of a certain length. At the termination points on either end, each wire is provided with a Bridge connection, such as in FIG. 6C, to connect the wire to a plurality of choices. Most times the signals have to be buffered at these junctions. An integration of the buffer structure shown in FIG. 6B with the bridge in FIG. 6C is shown in FIG. 6D. There are 12 buffers, 12 pass-gate devices, and 12 SRAM bits to make this circular bi-directional buffered bridge connection, which is astronomical on Si real estate. In the bridge in FIG. 6D, if there are N-ports, there are (N2-N) buffers needed to construct the full bridge, which is a quadratic relationship. Many such buffered Bridge connections are discussed by Lemieux (Ref-4, pages 123-124), and the attempt is to reduce the components necessary to build an efficient Bridge. A second embodiment of a Bridge is shown in FIG. 6E (Lemieux, Ref-4, page-124, FIG. 6.17e), which comprises 4 buffers, 14 pass gates and 14 SRAM bits. With FIG. 6E, if there are N-ports in the bridge, only N buffers are needed. Here the trade off is to reduce the number of buffers, at the expense of adding pass-gates and SRAM bits. The most effective solution has the least Si area consumption and the best signal transit delay through the bridge.
What is desirable is to have an inexpensive, fast and timing predictable routing block to connect logic elements. These routing connections need to facilitate short wire connections and long wire connections and then preserve timing in a predictable and calculable manner. It is also beneficial to have the ability to program the data flow direction, and have this configurability integrated into configuration circuits. When long wires are used, repeaters are inserted along wire segments to re-generate the signal integrity and improve signal delay. It is extremely cost ineffective to use two back-to-back buffers to provide bidirectional data flow. A technique to use a single bi-directional buffer would save a very large Si area and cost for programmable devices that use hundreds of thousands of wires. Much more efficient bridges that consume less Si real estate is needed for FPGAs. Furthermore, the drawbacks discussed earlier for bi-directional wires must be eliminated to improve fitting. Such a routing block should have reasonable cost parity to ASICs and also lend to an easy application specific design conversion to the user, preserving the original timing characteristics of the circuit during the conversion.