A variety of types of operations are needed for a digital signal processor ("DSP") to accomplish desired tasks. These operations are performed on data elements, operands, and the like and typically include mathematical operations, logic operations, shifting operations, and other data manipulation operations. For example, these operations might include adding, subtracting, multiplying, dividing, selecting, combining, arithmetic shifting, logic shifting, and the like. These operations may be identified in program instructions and may be executed by functional units, execution units, processing elements, or the like.
One such functional unit that is often utilized by DSPs is a multiply-accumulate ("MAC") unit. MAC units multiply two or more operands together and add the product to a value already stored in an accumulator. This value may be a fraction or integer, real or complex, and may be positive, negative or zero. The accumulator is a register that is at least wide enough to hold the largest product produced by the multiplier. It can be used as a source or a destination register for operations.
In trying to achieve faster processing while minimizing the physical size of the processor and the power requirements, a problem exists in whether to have accumulator registers dedicated to each of the MAC units or whether to have several accumulator registers available for random use by all of a processor's MAC units. In addition, it is important that the configuration between accumulator registers and MAC units facilitate different types of operations, such as scalar and vector operations, which are typically required by different types of DSP programming models.
More specifically, there are several types of programming models for digital signal processors which use MAC units. A first type of programming model is an instruction parallelism model which is defined by its ability to simultaneously execute different instructions. This model uses a horizontal approach to parallelism where several instructions are included in a long instruction word that is fetched and executed every cycle. This model may be embodied in a very long instruction word ("VLIW") model or a super-scalar model, among others. Instruction parallelism models are very effective in telecommunication applications.
Data parallelism models are a second type of model and may be able to simultaneously execute multiple operations of a single instruction, where each operation can be performed with different data. Data parallelism models utilize vector operations and are embodied in a single instruction multiple data ("SIMD") model. Data parallelism models are very efficient in block based applications such as image processing, filtering applications and multimedia applications.
In super-scalar and SIMD processors, typically the accumulator registers are either dedicated to specific MAC units or else a very small number of accumulator registers are available for all MAC units. These configurations may increase the processing time, consume too much physical chip space, or increase power consumption.
A more efficient configuration between MAC units and accumulator registers is needed, especially for processors that include both horizontal parallelism and vertical parallelism.