1. Field of the Invention
The present invention relates to an apparatus and method for performing multiply-accumulate (MAC) operations.
2. Background of the Invention
Multiply-accumulate (MAC) operations are used frequently in data processing systems. A MAC operation can take the form of A+B*C or A−B*C. The multiplication operation B*C is typically performed multiple times for different values of B and C, with each multiplication result then being added (or subtracted) from the running accumulate value A.
Dedicated MAC circuits are often provided within a data processing system to optimise performance of back-to-back MAC operations. Such MAC circuitry may be provided within a scalar processing circuit, where sequences of multiply-accumulate instructions are executed one after the other to execute the required iterations of the multiply-accumulate operation. However, one known approach for accelerating the performance of such an operation is to employ a SIMD (Single Instruction Multiple Data) approach. In accordance with the SIMD approach, multiple of the data elements are place side-by-side within a register, and then the required operation is performed in parallel on those data elements within multiple lanes of parallel processing. Considering the operations required to generate a single multiply-accumulate result, it will be appreciated from the above discussion that a plurality of separate multiply operations are required, and by using SIMD data processing circuitry, a plurality of those required multiplications can be performed in parallel to increase the throughput of the multiply-accumulate operation. As with the scalar circuitry, within SIMD data processing circuitry, a dedicated MAC circuit can be provided for optimising the performance of multiply-accumulate operations.
One type of digital signal processing operation which frequently uses MAC operations is a filter operation. When using a dedicated MAC unit, it is possible to perform complex filter operations relatively quickly. One particular example of where such filter operations are used is when comparing a received radio signal with a pilot signal, the pilot signal being used as a known reference signal by the receiver. The pilot signal will typically alternate between a known positive amplitude and a known negative amplitude to define a known pilot waveform. When performing such a filter operation, it is necessary to perform multiply-add and multiply-subtract operations throughout the filtering operation, with the form of the pilot signal at any particular point in time determining whether a multiply-add or a multiply-subtract operation is required.
When seeking to perform such an operation within a scalar MAC circuit, it would be possible to construct a sequence of multiply-add instructions and multiply-subtract instructions that when executed perform the required operation. However, a decision as to the exact sequence of multiply-add and multiply-subtract operations required needs to be made at compile time rather than at run time, and hence the sequence can only be constructed for a particular pilot signal. Such an approach hence lacks flexibility, and is undesirable from a code density point of view.
More typically, it may be desired to perform such a filter operation within a SIMD MAC circuit. In such situations, a multiply-add instruction can be issued to cause a plurality of multiply-add operations to be performed in parallel within multiple lanes of processing, and similarly a multiply-subtract instruction can be issued to cause a plurality of multiply-subtract operations to be performed in parallel within multiple lanes of parallel processing. To perform the earlier-described filter operations, it will typically be necessary to use one form of instruction, for example a multiply-add instruction, and then to encode as a vector of data elements the amplitude of the pilot signal at predetermined time intervals, i.e. directly identifying both positive and negative amplitude values. This requires a lot of space in memory to encode the pilot information, and those large vectors of information would need to be recalculated each time the pilot signal is changed. Furthermore, even if the pilot signals stayed the same, but the amplitude of the pilot signal changed due to variations in reception characteristics, the vectors within the memory would need to be reprogrammed.
If alternatively it were decided to try and make use of both the multiply-add instruction and the multiply-subtract instruction, this would typically give rise to some data rearrangement issues and also code density problems.
Hence, whilst it is desirable to use a dedicated MAC circuit to perform filter operations, it is difficult to perform certain types of filter operation efficiently using known MAC techniques.
An alternative approach which could be taken when one of the multiplication data elements (for example the element C referred to earlier) is of constant size, but only the sign changes (as is in fact the case for the earlier described pilot signal), would be to perform a sequence of additions and subtractions within an adder unit, followed by a single multiplication of the result by the constant amplitude value. Within such adder units, it is known to provide a predicated add/subtract instruction, which will either perform an addition of two numbers, or a subtraction of two numbers dependent on a predicate value provided in the instruction. Whilst such an instruction would provide flexibility for performing a sequence of addition and subtraction operations, adder circuitry is not typically optimised in hardware for back-to-back accumulation of the results for a sequence of add and subtract operations, and hence would not provide the performance that dedicated MAC circuitry would provide. Furthermore, the running accumulate value within an adder unit is typically held in the same size register as the input operands, and accordingly a great deal of care would be needed to ensure that the running accumulate value did not overflow or underflow, typically this requiring the use of various evaluate and shift type operations throughout the accumulate process, further impacting performance. In contrast, in a MAC unit, the accumulate value is typically held in a register that is larger than the input operands, thereby allowing a higher precision accumulate value to be maintained without risk of overflow or underflow. Furthermore, when using an adder circuit, then following the required additions, it would be necessary for a separate multiply instruction to be executed to perform the required multiplication to complete the filter operation.
Accordingly, it has generally been considered impractical to seek to perform such filter operations within separate adder circuitry, followed by a multiplication within separate multiplier circuitry.
Accordingly, it would be desirable to provide a technique which enabled multiply-accumulate operations to be efficiently performed within multiply-accumulate circuitry, in situations where variable sequences of multiply-add and multiply-subtract operations are required, as for example is the case in the earlier described filter operations.