A Field-Programmable Gate Array (FPGA) is an integrated circuit (a programmable device) consisting of logic and routing resources, with the capacity to implement various customer-designed hardware circuits. These programmable devices have been used since the 1980s in a wide range of applications from embedded systems to parallel high-performance computing. The tremendous growth in transistor density and the increasing power density in nano-CMOS has led to an end of Dennard scaling. There is now further integration of CMOS technology in FPGAs, resulting in aggressive growth of the inactive percentages of silicon die, also referred to as Dark Silicon.
However, the smaller footprint of logic resources relative to their high power consumption results in a power density greater than routing resources can accommodate, and may lead to ‘hot spots’ or other thermal challenges such as leakage-temperature positive feedback, performance degradation, and intensified aging.
The major contributors to FPGA high power are logic resources, in particular K-input look-up tables (LUTs) which serve as the primary blocks responsible for implementing an applications' functionality. K-input LUTs (K-LUTs) are logic elements that can implement all possible K-input functions. Hence, such applications can be mapped using adequate resources.
Although the overall performance of FPGA devices consisting of large-input LUTs is improved, increasing the number of LUT inputs comes at the expense of a higher area footprint and greater power consumption as a result of their larger, inefficient structure. Furthermore, the propagation delay of the LUT increases linearly, negatively affecting the profits obtained using large-input LUTs. Hence, LUTs equipped with more than six inputs are rarely used. Among the various configurations of LUTs, 4-LUTs afford designs associated with the smallest area. However, non-uniform distribution of the different functions used in the applications has led to poor logic utilization of 4-LUTs.
Various alternative architectures have been proposed, based on either power-gating of unused resources or manufacturing processes of low-leakage transistors. Such substitute architectures can reduce static power, but suffer from performance overhead. As an example, low-leakage manufacturing processes have been exploited. These manufacturing processes include variable transistor gate length, triple gate oxide, and multiple-Vth employed in interconnect pass transistors and configuration memory cells. However, such techniques cannot be employed in entire chip resources, due to the performance loss associated with the high-threshold transistors. In addition, these techniques may not be cost-effective due to the complexity of manufacture and fabrication.
Other types of structures have employed static (offline) or dynamic (online) power gating of unused logic and routing resources. However, such structures suffer from a large ‘wake-up’ (power-on) current. This current is drawn from the power rails, and can lead to register content instability, functional error, greater power overhead, and longer wake-up time. In addition, the idleness period must be large enough to offset the mentioned overheads. Moreover, the application behavior of these structures is unpredictable in interactive or input-dependent usages.
There is, therefore, a need for a logic unit with a simplified structure configured to provide high performance and reliability with a reduced number of cells, and associated with reduced static and dynamic power dissipation. There is also a need for a power allocation mechanism for power gating unused cells and modules in the logic unit. There is further a need for a method of efficiently mapping logic functions to the simplified logic unit structure, such that FPGAs or other programmable logic devices (PLDs) can be built up or into greater complexity using the simplified structure.