Many digital data processors, including most DSPs and multimedia processors, use binary fixed-point arithmetic, in which operations are performed on integers, fractions, or mixed numbers in unsigned or two's complement binary format. DSP and multimedia applications often require that the processor be configured to perform saturating arithmetic or wrap-around arithmetic on binary numbers.
In saturating arithmetic, computation results that are too large to be represented in a specified number format are saturated to the most positive or most negative number. When a result is too large to represent, overflow occurs. For example, in a decimal number system with 3-digit unsigned numbers, the addition 733+444 produces a saturated result of 999, since the true result of 1177 cannot be represented with just three decimal digits. The saturated result, 999, corresponds to the most positive number that can be represented with three decimal digits. Saturation is useful because it reduces the errors that occur when results cannot be correctly represented, and it preserves sign information.
In wrap-around arithmetic, results that overflow are wrapped around, such that any digits that cannot fit into the specified number representation are simply discarded. For example, in a decimal number system with 3-digit unsigned numbers, the addition 733+444 produces a wrap-around result of 177. Since the true result of 1177 is too large to represent, the leading 1 is discarded and a result of 177 is produced. Wrap-around arithmetic is useful because, if the true final result of several wrap-around operations can be represented in the specified format, the final result will be correct, even if intermediate operations overflow.
As indicated above, saturating arithmetic and wrap-around arithmetic are often utilized in binary number systems. For example, in a two's complement fractional number system with 4-bit numbers, the two's complement addition 0.101+0.100 (0.625+0.500) produces a saturated result of 0.111 (0.875), which corresponds to the most positive two's complement number that can be represented with four bits. If wrap-around arithmetic is used, the two's complement addition 0.101+0.100 (0.625+0.500), produces the result 1.001 (−0.875).
Additional details regarding these and other conventional aspects of digital data processor arithmetic can be found in, for example, B. Parhami, “Computer Arithmetic: Algorithms and Hardware Designs,” Oxford University Press, New York, 2000 (ISBN 0-19-512583-5), which is incorporated by reference herein.
Since DSP and multimedia applications typically require both saturating arithmetic and wrap-around arithmetic, it is useful for a given processor to support both of these types of arithmetic.
The above-cited U.S. patent application Ser. No. 10/841,261 discloses an efficient mechanism for controllable selection of saturating or wrap-around arithmetic in a digital data processor.
It may also be desirable in many applications to configure a given DSP, multimedia processor or other type of digital data processor for the performance of dot products or other types of vector multiply and reduce operations. Such operations frequently occur in digital signal processing and multimedia applications. By way of example, second and third generation cellular telephones that support GSM (Global System for Mobile communications) or EDGE (Enhanced Data rates for Global Evolution) standards make extensive use of vector multiply and reduce operations, usually with saturation after each individual multiplication and addition. However, since saturating addition is not associative, the individual multiplications and additions needed for the vector multiply and reduce operation are typically performed in sequential order using respective individual instructions, which reduces performance and increases code size.
A number of techniques have been proposed to facilitate vector multiply and reduce operations in a digital data processor. These include, for example, the parallel multiply add (PMADD) operation provided in MMX technology, as described in A. Peleg and U. Weiser, “MMX Technology Extension to the Intel Architecture,” IEEE Micro, Vol. 16, No. 4, pp. 42-50, 1996, and the multiply-sum (VMSUM) operation in Altivec Technology, as described in K. Diefendorff et al., “AltiVec Extension to PowerPC Accelerates Media Processing,” IEEE Micro, Vol. 20, No. 2, pp. 85-95, March 2000. These operations, however, fail to provide the full range of functionality that is desirable in DSP and multimedia processors. Moreover, these operations do not guarantee sequential semantics, that is, do not guarantee that the computational result will be the same as that which would be produced using a corresponding sequence of individual multiplication and addition instructions.
Accordingly, techniques are needed which can provide improved vector multiply and reduce operations, with guaranteed sequential semantics, in a digital data processor.