High performance digital signal processing applications can conveniently be implemented using programmable logic devices. Therefore, Stratix™ programmable logic devices from Altera Corporation include DSP blocks, in the form of high-performance embedded DSP units, which are optimized for applications such as rake receivers, orthogonal frequency division multiplexing transceivers, and image processing applications.
One defining feature of a DSP block is the bit length of the words which it handles. For example, a 16 bit DSP architecture stores data in the form of 16 bit words, and allows easy manipulation of such 16 bit words.
However, although a 16 bit architecture is sufficient for many applications, and therefore is in common use, there are a significant number of applications for which a 16 bit architecture is insufficient. For example, when using a digital signal processor to perform inversion of a matrix, the use of a 16 bit architecture may be insufficient to calculate the coefficients of the resulting matrix with the required accuracy.
In such circumstances, a floating point DSP processor can be used to obtain the result to the required accuracy, but such processors are expensive and inconvenient. Alternatively, a 16 bit architecture can be used to perform the required operations, but this is a slow process. To illustrate this, two multiplicands, each of up to 32 bits, can each be divided into two 16 bit words. The two words forming the first multiplicand must then be multiplied in turn by the two words forming the second multiplicand, so that four multiplication operations are required. The result of multiplying the most significant bits of the two multiplicands must then be shifted 32 bit positions to the left, while the two results of multiplying the most significant bit from one multiplicand with the least significant bits from the other multiplicand must be added together and shifted 16 bits to the left. Finally, these intermediate results must be added together to form the final result. This means that, if a 16 bit multiplication occupies one clock cycle of the digital signal processor, a 32 bit multiplication occupies nine clock cycles or more, depending on the data moving and shifting capabilities of the digital signal processor.