As semiconductor technology continues to inch closer to practical limitations in terms of increases in clock speed, architects are increasingly focusing on parallelism in processor architectures to obtain performance improvements. At the chip level, multiple processing cores are often disposed on the same chip, functioning in much the same manner as separate processor chips, or to some extent, as completely separate computers. In addition, even within cores, parallelism is employed through the use of multiple execution units that are specialized to handle certain types of operations. Pipelining is also employed in many instances so that certain operations that may take multiple clock cycles to perform are broken up into stages, enabling other operations to be started prior to completion of earlier operations. Multithreading is also employed to enable multiple instruction streams to be processed in parallel, enabling more overall work to performed in any given clock cycle.
These various techniques for improving execution unit performance, however, do not come without a cost. Parallelism adds complexity, often requiring a greater number of logic gates, which increases both the size and the power consumption of such execution units. Coupling these techniques with the general desire to increase performance through other techniques, such as increased switching frequency, the power consumption of complex, high performance execution units continues to increase, despite efforts to reduce such power consumption through process improvements. Excessive power consumption can present issues for portable or battery powered devices, but more typically, excessive power consumption presents issues for nearly all electronic circuits due to the generation of heat, which often requires elaborate cooling systems to ensure that a circuit does not overheat and fail.
Chip-wide control over power consumption is often used in electronic circuits such as those used in laptop computers or other portable devices, typically by throttling down the clock rate or frequency of the circuit to reduce power consumption and the generation of heat. In addition, power consumption may also be reduced in some instances by temporarily shutting down unused circuits on a chip, including, for example, entire execution units. In all of these instances, however, throttling back the power consumption of the circuit usually results in lower performance in the chip. Furthermore, the circuit characteristics that define the overall power consumption of such circuits, e.g., cycle time, voltage, logic area, capacitance, etc., are most often designed to meet a maximum performance target.
Particularly for complex System on Chip (SOC) designs, increasingly complex logic circuitry is being incorporated into individual chips, and in many instances, it costs more power per bit to move the bit from memory to the central processing unit (CPU), than it does to perform the desired computation. As a result, improved power reduction mechanisms are required for moving data around on, and off, chip. Additionally, many features once unique to digital signal processors (DSPs) are increasingly being implemented on general purpose processors to reduce cost by eliminating the need for separate DSP chips in a system and to increase performance by eliminating the need to move data between a DSP chip and the CPU.
However, many algorithms more traditionally performed by DSPs, e.g., Fast Fourier Transforms (FFT), do not perform as well using traditional general purpose processors or CPUs. Although some features added to more recent general purpose processor designs, e.g., SIMD execution units and predication, have significantly improved performance, the power consumption of general purpose processing units performing these algorithms is still typically much higher than that of DSP chips specifically tailored for those algorithms. This is primarily because general purpose processing units typically incorporate large blocks of logic such as multiple cache memories, multiple threads of execution, multiple execution units, etc. that are intended to improve performance generally for most workloads. However, for many DSP algorithms, this logic does very little to improve performance, and thus the additional power consumption of this logic is often effectively wasted when executing such DSP algorithms in a general purpose processor.
Therefore, a continuing need exists in the art for improved manners of reducing power consumption in an integrated circuit, particularly in connection with executing DSP algorithms and the like.