The x86 instruction set includes a PSADBW instruction. The PSADBW instruction includes two 64-bit input operands, each arranged as eight packed unsigned byte integers. One of the operands is a minuend operand of a subtraction operation and the other operand is a subtrahend operand of the subtraction operation. The PSADBW instruction generates an unsigned 16-bit result which is the sum of the absolute value of the eight differences of the corresponding eight unsigned byte integers when subtracting the subtrahend from the minuend. This particular result must be computed in various common applications, such as multimedia audio, video, or graphics applications, or scientific applications.
One approach to implementing the PSADBW instruction in a microprocessor is to generate the differences of the first and second packed operands, then take the absolute value of the differences, and then serially add the absolute values of the differences. However, this approach has the drawback of requiring a relatively large number of processor clock cycles to generate the result, particularly because the adds are performed serially. Therefore, what is needed is a fast apparatus for performing the PSADBW instruction.