1. Field of the Invention
The present invention relates to a switch block for FPGA architectures.
2. Description of the Related Art
As it is well known, technology scaling is leading to an exponential increase in integrated circuit leakage current, such that below 90 nm static power could be the dominant factor in energy consumption.
In particular, configurable structures such as FPGA (Field-Programmable Gate Arrays) architectures are affected more heavily than other devices such as ASICs by this problem, since they require many more transistors to support their main feature, the reconfigurability.
Indeed, configurable logic structures have become a valid alternative to ASICs because of the provided software programmability which reduces the design cycle, while density and running frequency greatly increase. It is well known that this flexibility is achieved at the cost of a larger silicon area occupation to accommodate the logic blocks that realize reconfigurability.
However, as technology scales the area constraint is becoming less restrictive, while the large number of integrated transistors in FPGA architectures is still a source of higher energy consumption of such architectures than the ASIC ones.
Since reconfigurable computing is a promising technology for wireless applications where systems need to support a variety of changing communication protocols, the power consumption constraint is becoming the main issue that could prevent FPGA architectures from being widely used in this field.
A typical configuration of a FPGA architecture is schematically shown in FIG. 1 and globally indicated with 1. The FPGA architecture 1 essentially comprises a plurality of programmable logic elements 2 arranged in a matrix-like configuration, commonly indicated as gate arrays, each of such programmable logic elements 2 being connected, by means of a plurality of local connections 3, to an interconnection network 4, in turn comprising a plurality of horizontal interconnection lines 4a and vertical interconnection lines 4b. 
As schematized in the figure, each programmable logic element 2 of the gate array essentially comprises one or more computational blocks 5 such as look-up tables, ALUs etc, having a plurality of inputs and being connected to an output through a multiplexer 6 having in turn an input connected to a memory element 7.
In particular, the interconnection network 4 allows to reconfigure the FPGA architecture 1, changing the operation thereof.
FIG. 2 schematically shows a FPGA architecture 1, depicted in island-style, and comprising a switch matrix 9 of switch blocks for connecting a plurality of connection lines.
In particular, the figure shown how the programmable logic element 2 is connected to a horizontal connection block 8a and to a vertical connection block 8b in turn connected to the switch matrix 9, in turn comprising a plurality of switch blocks 10.
When power consumption of a FPGA architecture 1 is considered, it is immediately evident that a large part of the device area is often left completely unused when a specific circuit is mapped, and its power consumption is useless.
Several studies have been conducted on dynamic power reduction for FPGA architectures. In particular, the problem of leakage consumption evaluating different Hw/Sw techniques has been described by Anderson et al in the article entitled: “Active leakage Power Optimization for FPGA”, FPGA2004, Feb. 22-24, 2004 as well as by Rahman et al. in the article entitled: “Evaluation of Low-Leakage Design Techniques for Field-Programmable Gate Arrays”, FPGA2004, Feb. 22-24, 2004.
As a matter of fact, the fraction of power consumption due to leakage current in FPGA architectures is rapidly increasing as technology advances. This is mainly due to the threshold voltage scaling which leads to an exponential increase in the subthreshold leakage.
Since leakage generates static power consumption which depends on the number of integrated transistors, FPGA architectures will be suffering from this problem even more than other devices.
As already pointed out, most of the transistors provided for flexibility purposes in a FPGA are left unused when implementing a circuit. These parts of the configurable device do not present dynamic power consumption but contribute to increased energy dissipation, having a static subthreshold current.
Since the percentage of total power dissipation due to leakage depends on the number of unused resources, in FPGA architectures the leakage power consumption can become relevant and energy efficiency can be deeply affected.
A common way to tackle this problem is to use high threshold transistors, since leakage depends exponentially on the threshold voltage. In fact the leakage current of a high threshold transistor is about two orders of magnitude lower than a low threshold transistor being equivalent in terms of area and working conditions. However this technique significantly affects delays and could be used only for non-timing critical circuits.
When adopting the above described dual threshold approach to the design of a circuit, as described for example by Wei et al. in the article entitled: “Design and Optimization of Low Voltage High Performance Dual Threshold”, DAC1998, 1998, an analysis of the criticality of the blocks composing the circuit needs to be carried out. In particular in the case of a FPGA architecture, it can be noticed that configuration memory elements, usually RAM cells, are not directly involved with signal transmission delay. Therefore all such static RAM cells can be implemented using slow high threshold transistors in order to save energy.
On the other hand switch blocks, connect blocks, logic blocks and lookup-tables (LUTs) contribute to delays in signal propagation. Therefore these blocks should be carefully studied in order to use high-speed low threshold (Vtl) transistors on the critical delay paths, and high threshold transistors (Vth) for the other components.
This technique has been extensively applied in the design of a PiCoGA configurable device, as described by Lodi et al. in the article entitled: “A Pipelined Configurable Gate Array for Embedded Processors”, Proceedings of the 11th ACM/SIGDA International Symposium on FPGAs, February 2003. Such a PiCoGA configurable device has been implemented on silicon in 0.13 μm STMicroelectronics technology.
Though the above described dual threshold approach has been adopted, it has been verified that yet considerable power dissipation due to leakage remains, which has been measured to be more than 25 mW for a PiCoGA configurable device area occupation of 11 mm2. This is a static consumption and scaling to technology below 100 nm will considerably increase it.
Transistors inside the logic block, LUTs and input connect blocks have smaller size since they drive only short local wires. On the other hand, switch blocks and output connect blocks drive routing wires crossing over several tiles, therefore the parasitic capacitive load involved is considerable. Since signal propagation through programmable interconnections is responsible for most delays in FPGA architectures, large buffers inside switch blocks are necessary to avoid a significant degradation of timing performance.
A switch block 10 realized according to the known designs is shown in FIG. 3. In particular, this figure shown a circuit schematic corresponding to a traditional implementation of a tri-state buffered switch optimized for delays, also indicated as Switch0.
The switch block 10 realizes the connection between a first line L0, a second line L1, a third line L2 and a fourth line L3, as shown in FIG. 4.
To do this, the switch block 10 comprises:
a first pass-transistor N0 connected between the first line L0 and a first internal node net0;
a second pass-transistor N1 connected between the second line L1 and the first internal node net0;
a third pass-transistor N2 connected between the third line L2 and the first internal node net0; and
a fourth pass-transistor N6 connected between the fourth line L3 and a second internal node net2.
In the example shown in the FIG. 4, the pass-transistors N0-N2 and N6 are of the NMOS type.
The switch block 10 also includes a first inverter P1-N4 and a second inverter P2-N5, inserted between the first and second internal nodes, net0 and net2, and interconnected at a third internal node net1.
In particular, the first inverter comprises a PMOS transistor P1 and an NMOS transistor N4 connected, in series to each other, between a first and a second voltage reference, in particular a supply voltage reference VDD and ground GND. The first inverter transistors P1 and N4 have their gate terminals connected to each other and to the first internal node net0 and the common drain terminals connected to the third internal node net1.
In a same manner, the second inverter has a PMOS transistor P2 and an NMOS transistor N5 connected, in series to each other, between the supply voltage reference VDD and ground GND, the second inverter transistors P2 and N5 having their gate terminals connected to each other and to the third internal node net1 and the common drain terminals connected to the second internal node net2.
Finally, the switch block 10 includes a pull down transistor N3 inserted between the first internal node net0 and ground GND and a pull up transistor P0 inserted between the supply voltage reference VDD and the first internal node net0, the pull up transistor P0 having its gate terminal connected to the third internal node net1.
In particular, the pull down transistor N3 is of the NMOS type and the pull up transistor P0 is of the PMOS type, both being high-voltage or Vth transistors (as indicated by a thicker gate line in the figure), all other transistors being low-voltage or Vtl transistors.
According to this design, only Vtl transistors are in the signal path of the switch block 10 in order to minimize delays.
In order to analyze the leakage current of inactive circuit elements, the switch block 10 has to be turned off. All the pass-transistors are turned off and the pull down transistor N3 is turned on, so that the first and second internal nodes net0 and net2 are pulled down. In particular, the pull down transistor N3 is driven by a driving signal, which corresponds to the inverted driving signal of the fourth pass-transistor N6.
The leakage current associated with the switch block 10 is composed of a fixed contribution due to subthreshold current from the transistors P1 to the transistor N4 of the first inverter and from the transistor P2 to the transistor N5 of the second inverter, and of a variable contribution. This variable contribution is due to the leakage current passing through the pass-transistors N0, N1, N2 and N6, and depends on the difference of voltage levels between source and drain of the pass-transistors themselves and thus on the external signals applied to the lines L0-L3.
There are five possible configurations for the external signals applied to the lines L0 . . . L3, each of them assuming a value 0 or a value equal to VDD−Vtl, being VDD the power voltage reference value and Vtl the threshold voltage value of a low-voltage transistor.
It can be seen that the leakage power consumption of the switch block 10 is highest when all the external lines assume a high value (difference at the terminals of pass-transistors equal to VDD−Vtl), and it is lowest in the opposite condition (no difference at the terminals of pass-transistors implies no leakage current).
A configuration choice providing a low value on the first and second internal nodes net0 and net2 minimizes the average leakage power due to the variable component.
In fact if these nodes are pulled down, the difference between internal nodes (net0 and net2) and external lines (L0 . . . L3) would be equal to Vtl in the best condition, while the difference would be the full logic swing (VDD) in the worst configuration.
Since the probability to have a high or low value on the external lines L0 . . . L3 is equal, on average the pull-down configuration of the switch block 10 shown in FIG. 3 reduces the leakage dissipation variable component, while the fixed one is nearly the same in both cases.
The traditional scheme of the switch block 10 as shown in FIG. 3 can be modified in order to save leakage current while keeping good timing performance, such modification being based on the consideration that signal delay is mainly due to the transition time of the two inverters when the input signal changes.
A modified switch block 10 (also indicated as Switch0−Vth pass-trans) can be thus obtained by using Vth transistors also for the pass-transistors N0, N1, N2 and N6, while Vtl transistors are used for the inverters.
It is also possible to modify the switch block 10 in order to minimize the leakage current by using all Vth transistors (also indicated as Switch0−all Vth trans). It is clear that this solution certainly reduces the leakage power consumption drastically, but at the cost of introducing many slow transistors on the signal path delay.
So, other solutions need to be found in order to apply reconfigurable computing to low-power portable environments.