1. Field of the Invention
The present invention refers to long-number calculating units and particularly to long-number calculating units with limited carry path.
2. Description of the Related Art
DE 3631992 C2 discloses a cryptography processor for efficient execution of the public key method of Rivest, Shamir and Adleman, which is also known as RSA method. The modular exponentiation needed in this method is calculated by using a multiplication look-ahead method and a reduction look-ahead method. Therefore, a three operand adder is used. The disclosed three operand adder has a length of 660 bits. An elementary cell consists of several cryptoregisters, a shifter, a half adder, a full adder and a carry-look-ahead element. Four such elementary cells form a four-cell block, wherein a carry-look-ahead element is associated to the four-cell block. Five such four-cell blocks form a 20-cell block. The encryption unit consists overall of 33 such 20-cell blocks and a control unit, which comprises a clock generator for clocking the elementary cells. The carry-look-ahead elements of the four-cell blocks are connected together in order to realize whether a carry propagates over a bigger distance, namely 20 bits. When a propagate signal of the 20-bit block is active, this means that the carry of the considered 20-bit blocks depends on a carry at the output of the previous block. However, when the propagate signal of a 20-bit block is not active, this means that a possibly present carry at the output of this block, i.e. at the most significant bit of this block has been generated within this block, but is not influenced by the previous block.
Thus, it is possible to make the clock of the calculating unit, i.e. the rate with which the new input operands are fed in, faster than the worst case, where the carry path extends from the least significant bit of the whole calculating unit to the most significant bit of the whole calculating unit. If a propagate signal is activated for a 20-bit block, the clock of the whole calculating unit is decelerated such that the worst case is considered, i.e. the calculating unit is stopped until a carry from the least significant bit of the whole calculating unit has propagated to the most significant bit of the whole calculating unit.
The cycle time, i.e. the time after which the next input operands are fed into the calculating unit, is thus set such that it is just sufficient to process the carry of directly adjacent blocks. This has the advantage that independent of the number of digits or elementary cells of the calculating unit only the time of a block carry has to be considered. If, however, a determination is made that the carry of the current block is not only affected by the previous block but also by the block preceding the previous block, the cycle time is made so slow that there is enough time for a complete carry path.
The described concept is advantageous in that no longer the length of the calculating unit determines the velocity, but the velocity corresponds to the length of a block, i.e. to the length of the carry path to be expected—but also to the number of blocks.
The described method is disadvantageous in that, when it is determined that a carry propagates over a longer distance than one block, i.e. when a so-called panic signal is generated, the calculating unit is stopped as a whole, in order to consider the worst case. If, therefore, the length of a block is chosen short, which is as such desirable, since the clock period can be increased (the cycle time can be decelerated), no significant velocity gain will occur, since panic signals occur more often, so that the calculating unit as a whole is slowed down by the constant panic case.
If, however, the length of a block is chosen relatively long to decrease the number of panic cases and almost eliminate it, respectively, the cycle time has to be increased as well, since it has to be so big that maximally the case is considered where a carry is generated in the second least significant bit of the previous block, passes through the previous block and further passes through the current block.
Thus, a short block length leads to a higher clock rate, but, however, leads all in all to a reduced performance of the calculating unit due to the heavily increasing number of panic cases. A long cycle time, i.e. a low clock rate, leads, however, to a decreasing number of panic cases, is, however, not desirable in that only a limited number of addition processes can be performed rate in a certain time due to the low clock.
For the context of the probability for a panic event the following equation, which summarizes the previous statements applies:P=1−(1–2−BL)(NBL−2).
In the above equation, P is the probability for the occurrence of a panic event, BL is the block length and NBL is the number of blocks.
A disadvantage of the described calculating unit with limited carry path is the fact that a subtraction cannot be performed easily. The subtractionc=a−bis typically not calculated directly, but by inverting the parameter b and then summing the inverted parameter b to a:c=a+(−b).
The negation of the parameter b is typically achieved by inverting the operand b in its form stored in the register, and by adding a one:c=a+({overscore (b)}+1)
When the LSB of the operand b, which has been inverted, typically falls on the least significant bit of the least significant adder block, the inversion of the operand b causes no problem. If, however, a calculating unit is used, whose length in bits is greater than the number of bits of the operand to be subtracted, the case can and will be occur that the LSB of the two operands to be processed does no longer coincide with the least significant bit of the least significant adder block, but falls on a higher order bit of the least significant adder block or even into a higher order block. The bits below the current least significant bit are set to 0.
When now the register where the operand b is stored, is inverted, the actually insignificant bits below the least significant bit (LSB) of the operand b are set to one. If no counter measures are taken, this leads to the fact that a panic signal is triggered, since all propagate values of the adder blocks below the blocks where the least significant bit is, are equal to 1. A subtraction would therefore automatically lead to a “total stop” of the calculating unit.
To prevent this case, a complete turnoff of the adder blocks below the blocks where the least significant bit of the second operand b is, is effected in the known cryptography processor, which is, for example, available under the name “Advanced Crypto Engine” (ACE) at Infineon Technologies AG, Sankt-Martin-Str. 53, 81669 Munich, Germany. This is performed via a specific command, which is called “Adjust CU” in the ACE.
A disadvantage of this solution is that it is only possible within a calculation, i.e. a complete multiplication with fixed least significant bit. This implies directly that the calculating unit for look-ahead algorithms needs an overflow buffer for shifting the operands.
A disadvantage of a calculating unit with overflow buffer is that there might be an unused area, which can be used for shifting and as underflow buffer, respectively.