1. Field
This disclosure relates generally to processor arrays such as single instruction multiple data (SIMD) arithmetic and logical unit (ALU) arrays, and very long instruction word (VLIW) computing machines.
2. Description
Imaging workloads, such as camera input, print, and display imaging workloads, are typically processed using VLIW and SIMD computer processors. Alternatively, a system on a chip (SOC) may implement SIMD using single instruction multiple thread (SIMT) processors. An SIMT processor includes SIMDs units running in parallel. Such systems are typically configured to use ALU arrays of a specific width to accommodate the particular machine instruction being processed. As used herein, the width of the processor refers to the number of lanes in the particular processor. A lane includes one ALU and at least one register. Computing machines may have different instruction widths for processing vectors or data using a single instruction with multiple data, otherwise known as SIMD processing. Generally, an SIMD processing unit may include lanes that perform various operations, such as floating point calculations and integer calculations. The integer SIMD lane may also be referred to as an ALU lane, as the hardware for an integer SIMD lane and an ALU is nearly identical.
However, because many instructions do not occupy the full width of the processor, SIMD processors may be under-utilized for parts of some workloads. For example, SIMD processors are typically under-utilized when processing imaging workloads. Accordingly, a portion of the available processing power is not used to perform any processing, while the unused portion of the processor remains in an active state that uses power and generates heat. The additional heat from the unused portion of the processor must also be cooled. For mobile devices in particular, SIMD under-utilization reduces valuable battery life as a result of the added power consumption from powering the unused portion of the processor as well as cooling the unused portion of the processor. One approach to eliminate the additional power consumption involves SIMD compilers generating executables that fill the SIMD ALU lanes, when possible. However, this approach still leaves many of the available SIMD lanes under-utilized.