1. Field of the Invention
The present invention refers to calculating units and particularly to long-number calculating units for cryptographical tasks.
2. Description of the Related Art
DE 3631992 C2 discloses a cryptography processor for efficient execution of the public key method of Rivest, Shamir and Adleman, which is also known as RSA method. The modular exponentiation needed in this method is calculated by using a multiplication look-ahead method and a reduction look-ahead method. Therefore, a three-operand adder is used. The disclosed three-operand adder has a length of 660 bits. An elementary cell consists of several cryptoregisters, a shifter, a half adder, a full adder and a carry-look-ahead element. Four such elementary cells form a four-cell block, wherein a carry-look-ahead element is associated to the four-cell block. Five such four-cell blocks form a 20-cell block. The encryption unit consists overall of 33 such 20-cell blocks and a control unit, which comprises a clock generator for clocking the elementary cells. The carry-look-ahead elements of the four-cell blocks are connected together in order to realize whether a carry propagates over a bigger distance, namely 20 bits. When a propagate signal of the 20-bit block is active, this means that the carry of the considered 20-bit blocks depends on a carry at the output of the previous block. However, when the propagate signal of a 20-bit block is not active, this means that a possibly present carry at the output of this block, i.e. at the most significant bit of this block has been generated within this block, but is not influenced by the previous block.
Thus, it is possible to make the clock of the calculating unit, i.e. the rate with which the new input operands are fed in, faster than the worst case, where the carry path extends from the least significant bit of the whole calculating unit to the most significant bit of the whole calculating unit. If a propagate signal is activated for a 20-bit block, the clock of the whole calculating unit is decelerated such that the worst case is considered, i.e. the calculating unit is stopped until a carry from the least significant bit of the whole calculating unit has propagated to the most significant bit of the whole calculating unit.
The cycle time, i.e. the time after which the next input operands are fed into the calculating unit, is thus set such that it is just sufficient to process the carry of directly adjacent blocks. This has the advantage that independent of the number of digits or elementary cells of the calculating unit only the time of a block carry has to be considered. If, however, a determination is made that the carry of the current block is not only affected by the previous block but also by the block preceding the previous block, the cycle time is made so slow that there is enough time for a complete carry path.
The described concept is advantageous in that no longer the length of the calculating unit determines the velocity, but that the velocity corresponds to the length of a block, i.e. to the length of the carry path to be expected, which depends on the length of a block.
The described method is disadvantageous in that, when it is determined that a carry propagates over a longer distance than one block, i.e. when a so-called panic signal is generated, the calculating unit is stopped as a whole, in order to consider the worst case. If, therefore, the length of a block is chosen short, which is as such desirable, since the clock period can be increased (the cycle time can be decelerated), no significant velocity gain will occur, since panic signals occur more often, so that the calculating unit as a whole is slowed down by the constant panic case.
If, however, the length of a block is chosen relatively long to decrease the number of panic cases and almost eliminate it, respectively, the cycle time has to be increased as well, since it has to be so high that maximally the case (the normal worst case) is considered, where a carry is generated in the second least significant bit of the previous block, passes through the rest of the previous block and further passes almost fully through the current block, if, for example, a kill parameter can be found in the most significant bit of the current block.
Thus, a short block length leads to a higher clock rate, but, however, leads all in all to a reduced performance of the calculating unit due to the heavily increasing number of panic cases. A long cycle time, i.e. a low clock rate, leads, however, to a decreasing number of panic cases, is, however, not desirable in that only a limited number of addition processes can be performed rate in a certain time due to the low clock.