In digital signal processing applications one of the most frequently used functions is the sum-of-products:
      SoP    ⁡          (                        S          i                ,                  C          i                ,        n            )        =            ∑              i        =        1                    n        -        1              ⁢                  ⁢                  S        i            ·              C        i            wherein SoP is Sum of Products, n is number of products, i is a counter value, Si is the ith of n samples of a quantized signal, and Ci is the ith of n coefficients (e.g. filter or transformation coefficients).
DSPs (Digital Signal Processors) and also some standard microprocessors have dedicated instructions for fast and efficient sum-of-products calculations. A very commonly used instruction is a “MAC” (Multiply-and-Accumulate) instruction which combines the inner loop multiply and add operation into a single instruction.
Operands of microprocessor instructions are represented by a limited number of bits. The limit is defined by the register width of the microprocessor hardware. For integer operands this limit defines the maximum range of values that can be represented. In digital signal processing operands represent quantized analog signals and the operand size limit defines the precision or in other words the quality of the analog signal approximation.
As an example, general purpose microprocessors very often have the same size limit for source and destination operands, set by the width of their general purpose registers. For example, microprocessor architectures that follow modern RISC (Reduced Instruction Set Computer) concepts may have a large set of equally sized general purpose registers being used for the source and destination operands of compute instructions.
Multiplying two integer numbers with n bits size generates a product of 2·n bits size for unsigned numbers and a product of 2·n−1 bits size for signed numbers. Depending on the source operand sizes results of multiply instructions may not fit completely in a general purpose register having the same size as the source register. By adding up a sequence of products, sum-of-products calculations may generate result operands having even more bits than a single product. For example, a sum-of-product calculation with n=16, and Si, Ci being signed 16-bit values generates a product of 2·16−1+4=35 bits size.
A common type of multiply instruction found in the instruction sets of many general purpose microprocessors contains storing the low order bits of the product in the destination register. This type of multiply instruction is often used to support high level languages, but is not very well suited for DSP computations. The results of multiply instructions and of sum-of-products calculations can overflow. For DSP computations, “multiply-high” instructions that store the high order bits of the product may be used. The results of single multiply instructions cannot overflow. However, precision of the result operand is reduced because the low order bits of products are discarded.
With increasing gate density and associated decreasing cost of digital circuits, some modern general purpose microprocessors provide fast multiply operations and in principle could be used also for applications typically executed on a DSP. However, due to the width of destination registers the precision of sum-of-product calculations remains limited.
DSPs are a special class of microprocessors. Typically DSPs contain accumulator registers with an extended width to avoid loss of precision in single product and sum-of-products calculations. In general purpose control and compute applications the extended width registers of DSPs provide only little benefit. Additionally, the irregular register sizes and different sizes of source and destination operands and registers complicate the programming model or register set and limit the efficiency (code density and performance) of DSPs in general purpose control and compute programs.
Hence, general purpose microprocessors may not be very well suited for DSP applications and DSPs may not be very well suited for general purpose control and compute applications. For applications with mixed requirements microprocessor architectures with high efficiency and performance in both categories would be beneficial.
However, general purpose microprocessors may be used for DSP algorithms despite the precision versus register size problem.
An example approach suitable for most general purpose microprocessors, source operand sizes may be chosen small. By using source operands of small size (bit width) the results of sum-of-products calculation may be prevented from overflowing. For example, in video or graphics applications, samples are typically 8 to 12 bit values and coefficients 12 to 16 bit values. The length n of sum-of-products calculations for video/graphics is small, in the range of 2 to 8. For example, a processor with 32-bit registers can correctly calculate sum of products of this type. However, resource efficiency is low when using 32·32-bit multiplications, since in video/graphics applications typical output samples have 8-bit precision and for intermediate calculations 16 to 20 bits are sufficient.
Using multiply-high instructions is another example approach provided by some microprocessors having additional multiply instructions that store the high order bits of products in the destination register. This concept is also used for MAC (Multiply-and-Accumulate) instructions. With this concept, operands are treated as fixed point numbers with the decimal point left of the most significant bit. In principal this is approach may be used for DSP algorithms. For small operands, the data paths (multipliers, registers, Arithmetic Logic Unit (ALU)) can be split into multiple smaller pieces to enable SIMD (Single Instruction Multiple Data) operations. However, the least significant bits of products that must be calculated anyway to obtain the most significant bits are discarded and do not contribute to the precision of sum-of-product calculations.
Yet another approach is based on two concatenated general purpose registers for multiply/MAC destination operands, which may be a typical solution for many DSP algorithms. However, the available number of destination registers is reduced by half.
In US 2002/0178203 A1 it is shown that instead of using general-purpose-registers, additional dedicated accumulator registers may be used. The programming model for the microprocessor may include one or more dedicated accumulator registers for extended precision sum-of-product calculations. To make use of the extended precision special Multiply and MAC instructions are provided that specify an accumulator register as destination. At the end of a sum-of-products sequence a separate instruction transfers the accumulator content (typically with optional shifting, rounding and clipping) to a general purpose register. However, extra instructions are required at the end of sum-of-product sequences to transfer the accumulator content to a general purpose register. This decreases performance, especially for short sequences. The programming model of the processor becomes more complex and the opcode map, i.e. the map for the portions of a machine language instruction that specify the operation to be performed, requires extra space for the multiply and MAC instructions that specify an accumulator