Many devices utilize integrated processors, such as microprocessors and digital signal processors, with complex arrangements of logic for performing data processing functions in accord with program instructions. Applications that require digital processing of multimedia data, such as video, audio or graphics, are becoming increasingly popular with consumers. Processing of such information, however, is intensive and has lead to processor architectures that are particularly suited to processing of such data.
Multimedia data typically includes a considerable amount of “parallel” data. Data is “parallel” where the individual units of data are not dependent on one another. Hence, processing of one unit of data is independent of processing of another unit, that is to say it need not wait for processing of any other unit to be completed. As a result, it is possible to perform a number of such independent data processing operations in parallel, at the same time. This feature of certain types of data, particularly the common forms of multimedia data, has led to the creation of parallel processors, which can simultaneously manipulate units of data in parallel. Parallel processing of multimedia data, for example, often helps to substantially increase overall processing speed.
A number of different architectures and instructions types have been developed for parallel data processing, particularly for multimedia applications. For example, Single Instruction, Multiple Data (SIMD) processors process data in parallel. Multimedia processing using SIMD instructions reduces the overall number of instructions required to execute a particular program task and speeds up performance by operating on multiple data elements in parallel. Although the processor executes a single stream of instructions, the SIMD execution of those instructions concurrently processes multiple data streams in parallel.
Many applications of processors, including highly parallel data processing type devices like SIMD processors, place severe constraints on power that the processor circuitry can consume. For example, portable devices, like cell phones, PDAs (portable digital assistants) and handheld video games, utilize battery power supplies. However, these devices include sophisticated microprocessors and in some cases use co-processors for multimedia related processing. Processor designs for such applications warrant careful control of power consumption, typically, to extend life of a charge in the battery power supply.
The architecture of a processor establishes a “width” of the data path through the processor, that is to say the maximum size of the data that can be processed. Parallel processing designs, such as SIMD processor architectures, are typically scaled to provide a data path width that corresponds to the maximum amount of parallel data that the device can process during a given cycle. Current SIMD processors are available that can process up to 128-bits of data at a time, which means that the overall width of the data path is 128-bits. However, at any given time, parallel portions of the processor may be processing smaller units of the data.
Although other sizes are known, common parallel processors today offer a 64-bit data path or a 128-bit wide data path. The data path is constructed of parallel processing elements, although the paths can be configured to handle data of different widths. A 128-bit data path, for example, can be broken up into small sizes, that is to say the processor can process sections of the 128-bit data that are 8-bits long, 16-bits long, 32-bits long or 64-bits long, as specified by the SIMD instructions written for the particular application. Using 8-bit instructions for example, a processor with a 128-bit wide data path can process sixteen 8-bit data units, in parallel. Conversely, with a 64-bit data path, if an instruction requires 128-bits, then the data may be divided into two 64-bit sections, and the instruction is sequentially executed on both 64-bit sections. The processing of each 64-bit section, of course may entail parallel processing, e.g. of eight 8-bit data units. By dividing the processing for the 128-bit instruction, the 64-bit wide data path can handle the 128-bit instruction, although the time for the divided processing is longer.
These operations allow optimum utilization of the parallel resources of the processor. Still, times occur when not all processing resources are needed for particular processing applications. Many functions or applications of the processor simply do not require the full processing capability of the processor device or require the full processing capability only for a very limited time. In the 128-bit data path processor example, an application or a portion thereof may require only 64-bit data processing, for some substantial period(s) of time, for example because there is a limited amount of data parallelism, the amount of data to process is low or there is not such a great need for speed. If the elements providing the 128-bit wide data path are all fully powered continuously, however, unused parallel elements are unnecessarily consuming power.
A parallel processor could be designed with a lower degree of parallelism than required for some applications, in order to be more efficient for applications that do not require the higher degree of parallelism. Although this compromise reduces power consumption for applications requiring less parallelism, it results in wasted power and poor performance when more parallelism is required.
Hence, low-power applications for parallel processors still create a need for a technique to selectively control power to a parallel element of a SIMD processor or the like, so as to effectively reduce power consumption.