Digital signal processors (DSPs) are special-purpose processors utilized for digital processing. Signals are often converted from analog form to digital form, manipulated digitally, and then converted back to analog form for further processing. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and efficiently on a set of data.
DSPs thus often incorporate specialized hardware to perform software operations that are often required for math-intensive processing applications, such as addition, multiplication, multiply-accumulate (MAC), and shift-accumulate. A Multiply-Accumulate architecture, for example, recognizes that many common data processing operations involve multiplying two numbers together, adding the resulting value to another value and then accumulating the result. Such basic operations can be efficiently carried out utilizing specialized high-speed multipliers and accumulators.
DSPs, however, generally do not provide specialized instructions to support non-linear mathematical functions, such as exp, log, cos, 1/x and xK. Increasingly, however, there is a need for non-linear arithmetic operations in processors. A nonlinear function is any problem where the variable(s) to be solved for cannot be written as a linear sum of independent components. If supported at all, a DSP supports a non-linear function by using a large look-up table (LUT). An exemplary LUT may store on the order of 2,000 16 bit values, and thus require 32 kilobits of random access memory (RAM). The LUT is typically implemented in a separate dedicated SRAM (so that data and the non-linear LUT can be accessed at the same time to achieve improved performance).
In cases where the DSP is based on VLIW (Very Long Instruction Word) or SIMD (Single Instruction Multiple Data) architectures with N issues slots, the memory size becomes even larger. The LUT must be replicated N times because each issue slot must be able to read different values in the look-up table simultaneously, as the values of the data in each issue slot may be different. This replication of memory results in an even greater silicon area. For example, assuming a LUT in a 4-way vector co-processor, a memory size of 128 Kb is required (32 Kb×4). In addition, if different non-linear functions are required for different parts of a program being executed, the various LUTs must be loaded into memory, thereby significantly increasing latency and potentially reducing performance.
A need therefore exists for a digital signal processor having an instruction set that supports an xK function using a look-up table of reduced size.