1. Field of the Invention
The present invention relates to the field of microprocessors; more particularly, the present invention relates to a method and apparatus for computing a sum of absolute differences.
2. Description of Related Art
A sum of absolute differences is used in many applications including video applications such as Motion Pictures Expert Group (MPEG) encoding.
One method of computing a packed sum of absolute differences (PSAD) of packed data A having eight byte elements A0 . . . A7 and packed data B having eight byte elements B0 . . . B7 is to compute Aixe2x88x92Bi and Bixe2x88x92Ai for each value of i from 0 to 7, select the results that are non-negative, and add the non-negative results together. One implementation uses sixteen adders (two adders for each pair of byte elements), eight muxes (to select the non-negative values from each pair of results) and an adder tree to sum the non-negative results.
As more devices are used, more silicon area is needed in a semiconductor device. Semiconductor devices generally have a cost proportional to the silicon area used. Therefore, it is desirable to reduce the number of devices used to perform the PSAD instruction.
One method of computing a PSAD with less devices is to use the same device to serially operate on multiple data elements. For example, one adder may compute A0xe2x88x92B0 and B0xe2x88x92A0 sequentially, another may compute A1xe2x88x92B1 and B1xe2x88x92A1 sequentially, etc. This reduces the number of adders (silicon area) used, but increases the amount of time required to compute a PSAD.
What is needed is a method and apparatus to reduce the amount of silicon area required to implement a PSAD instruction without increasing the time required to compute the PSAD.
A method and apparatus that adds each one of multiple elements of a packed data together to produce a result is described. According to one such a method and apparatus, each of a first set of portions of partial products is produced using a first set of partial product selectors in a multiplier, each of the first set of portions of the partial products being zero. Each of the multiple elements is inserted into one of a second set of portions of the partial products using a second set of partial product selectors, each of the second set of portions of the partial products being aligned. Each of the multiple elements are added together to produce the result including a field having the sum of the multiple elements.