Many image and video processing techniques include operations in which a number of pixel values are averaged. These pixel averaging operations may be used for, for example, filtering and image estimation. The averaging may be performed on the pixel values of a number of neighboring pixels. The averaging may also be performed on the pixel values corresponding to the same pixel at different times, e.g., between frames. These averaging operations may be computationally intensive.
In order to support the computational load and data throughput requirements associated with performing a large number of averaging operations, processors used for image and video processing may introduce SIMD (Single-Instruction/Multiple-Data) operations. In SIMD operations, a single instruction is sent to a number of processing elements, which perform the same operation on different data.
One type of SIMD operation utilized in some image processing algorithms is a four-pixel averaging operation. A SIMD arithmetic logic unit (ALU) used to produce the average values may perform four addition and averaging operations on four sets of pixel values simultaneously to produce four 8-bit pixel average values. A 40-bit SIMD adder may be used to perform this instruction on 8-bit values. The 40-bit SIMD adder includes two dummy bits for each byte. One dummy bit controls the blocking or propagation of carries from an addition operation, and the other dummy bit controls a shifting operation. The shifting operation may be performed only in the four-pixel averaging operation, while other instructions that utilize the SIMD adder may only require a 36-bit adder. A 40-bit SIMD adder may have a larger layout than a 36-bit SIMD adder and require additional structure to accommodate the extra dummy bits, taking up limited chip area just to accommodate one instruction.