DSP engines must do mathematical computations quickly. However, compromises are made when it comes to the precision of certain calculations. For example, a 16-bit DSP engine is generally restricted to 16-bit mathematical operations. However, 32-bit operations can be supported by the hardware and may be implemented by respective programming. To this end, for example, many 16-bit DSP engines provide for much larger accumulators, such as 40-bit accumulators, and other hardware that can accommodate higher precision. These hardware structures in combination with a multiplier can be used to perforin higher bit-multiplications such as 32×32-bit multiplications in a 16-bit DSP engine. Nevertheless, such operations can slow down the processing speed significantly, in particular when many high precision multiplications are required. Fast Fourier transformation (FFT) operations, for example, require many such operations and may therefore require substantial processing time. Dedicated 32-bit multipliers require significant amount of chip real estate and thus would increase the cost. Moreover, new instructions would be required to operate such additional hardware.
What is needed is an improved DSP math capability in existing DSP cores without having to change the instruction set and with minimal changes to existing hardware.