Arithmetic units are used especially in programmable integrated circuits such as microprocessors or microcontrollers. They can be used to carry out a number of arithmetic operations on operands. A well-known field of application is digital signal processing, to apply filters, transfer functions, transforms etc.
One arithmetic operation, commonly used for the different processing operations, is known as the multiplication-accumulation operation in which a sum of products is computed. If two input operands are called A and B and the contents of an accumulator are called Y, this operation is used to compute: RES=A×B+Y. This arithmetic operation conventionally requires an accumulator to contain the new result RES, which will be used at the next iteration, a multiplier to which the two input operands A and B are applied and an adder that receives, at input, the result of the multiplication A×B and the current contents Y of the accumulator. In practice, the current contents of the accumulator are transferred to the adder via a register. The result of the operation is itself transferred into the accumulator.
A known multiplier structure comprises a multiplication stage to compute the partial products P1 and P2 of two input operands A and B, and an adder stage to compute the sum of the partial products P1 and P2. At output, the result of the product of the input operands: A×B=P1+P2, is obtained. This multiplication technique is well known to those skilled in the art.
A microarchitecture for an arithmetic unit that performs multiplication-accumulation operations and uses a multiplier of this kind also includes two cascaded adders. According to the usual meaning given to it in the field considered, the terms “adder” or “full adder” must be understood as a circuit that gives the result of the sum of two operands applied at input. An adder of this kind may, for example, be a carry look-ahead adder or a carry propagation adder.
The first adder computes the sum of the partial products. The second adder adds the current contents of the accumulator to this sum and gives the final result at output. This final result corresponds to the multiplication-accumulation operation. This result is then loaded into the accumulator. The second adder also gives an output carry bit corresponding to the output carry value whose rank corresponds to the most significant bit of the result. In one example, with operands in the two's complement signed format and a 40-bit adder, the result has a signed format on 40 bits, including one sign bit ranked 39, and 39 significant bits ranked 38 to 0, the rank of the most significant bit being the rank 38. The output carry value will thus be referenced C39.
This output carry value C39 or its complementary value referenced B39 is usually stored in a state register. This state register has a certain number of flag bits used by programmers in logic equations proper to the architecture considered.
One problem of the microarchitecture that has just been described is that the two cascaded adders provide a non-optimized computation time which is especially disadvantageous as the operands to be processed are encoded on many bits. With two cascaded adders, the data path is particularly lengthy.
A microarchitecture for an arithmetic unit that is improved in terms of data path uses a carry save adder circuit or CSA. This carry save adder makes two outputs correspond to three inputs. The total time for computing the result is then equal to the time needed to compute a single bit. This carry save adder circuit is followed by an adder that gives the final result. Thus, there is a savings in the computation time of an adder. This savings is equal to the computation time of on adder minus the computation time of the CSA adder.
In this microarchitecture, the carry save adder circuit thus receives the two partial products P1 and P2 and the current contents Y of the accumulator at input. At output, it gives a “carry” vector and a “result” vector. These two vectors are applied as inputs to the adder which gives the result RES at output.
One problem related to this improved microarchitecture is that the output carry information from the adder which gives the result at output is not equal to the result obtained in the first microarchitecture described, with two cascaded adders. This can be understood when we consider the fact that the two inputs of the adder which gives the result and the output carry are not the same in both microarchitectures: in the first architecture, one of the two inputs of this adder receives the sum of the partial products while the other input receives the current contents of the accumulator. In the improved microarchitecture, it is the two carry and result vectors that are given by the carry save adder. Now, each output bit of the adder that gives the final result is prepared from bits applied to the input of the adder and from internally generated carry values. The output carry value is none other than the last internally generated carry value. It will be understood that the same result can be obtained at output of the adder, but with a different output carry value that will be referenced C39′. This carry value may be called a “compressed carry value”.
To illustrate this problem, let us take the example of the operands, E1, E2, Y, expressed in a four-bit two's complement representation.E1=1 1 1 0|2=−2|10E2=0 0 0 1|2=1|10Y=1 1 1 0|2=−2|10
We have an adder N=4 bits. The output carry value is the carry ranked N−1=3, referenced C3. Determining of the output carry value C3 given by the second adder of the standard microarchitecture (2 cascaded adders):
E11110E20001TMP111111Result of the first adderY1110RES1101Result of the second adderC31With output carry value
Determining the compressed output carry value C′3 given by the adder which gives the final result in the improved microarchitecture (CSA adder and cascaded adder):
E11110E20001Y1110Σ0001CSA adder: Result VectorR110.and Carry VectorRES1101Final adder resultC′30and compressed output carry value
It can be seen in this example that the compressed output carry value C′3 does not correspond to the output carry value C3. Thus, the compressed output carry value does not correspond to the definition of the corresponding flag bit in the state register.