1. Field of the Invention
The present invention relates to calculating units and, in particular, to long number calculating units configurable with respect to their length.
2. Description of Prior Art
DE 3631992 T2 discloses a cryptography processor for efficiently embodying the public key method by Rivest, Shamir and Adleman, which is also known as the RSA method. The modular exponentiation required in this method is calculated using a multiplication look-ahead method and a reduction look-ahead method. For this, a three-operands adder is used. The three-operands adder disclosed has a length of 660 bits. An elementary cell consists of several cryptoregisters, a shifter, a half adder, a full adder and a carry look-ahead element. Four such elementary cells form a four-cells block, a carry look-ahead element being associated to the four-cells block. Five such four-cells blocks form a 20-cells block. The encryption unit consists of a total of 33 such 20-cells blocks and a control unit including a clock generator for clocking the elementary cells. The carry look-ahead elements of the four-cells blocks are interconnected to recognize whether a carry propagates over a greater distance, that is 20 bits. When a propagate signal of the 20-bits block is active, this means that the carry of the 20-bits block considered depends on a carry at the output of the previous block. When the propagate signal of a 20-bits block, however, is not active, this means that a carry maybe present at the output of this block, that is at the most significant bit of this block, has been produced within this block, is, however, not influenced by the previous block.
Thus, it is possible to make the clock of the calculating unit, that is the rate at which new input operands are fed, faster than the worst case in which the carry path is from the least significant bit of the entire calculating unit to the most significant bit of the entire calculating unit. If a propagate signal for a 20-bits block is activated, the clock of the entire calculating unit is slowed down such that the worst case is taken into account, that is the calculating unit is stopped until a carry has propagated from the least significant bit of the entire calculating unit to the most significant bit of the entire calculating unit.
The cycle time, that is the time after which next input operands are fed into the calculating unit, is thus adjusted such that it is just sufficient to process the carry of directly neighboring blocks. This has the advantage that, irrespective of the number of digits of the calculating unit, only the time of a block carry has to be taken into account. When it is, however, determined that the carry of the current block is not only impeded by the previous block but also by the block preceding the previous block, the cycle time is made so slow that there is sufficient time for a complete carry path.
FIG. 4 shows an elementary cell for a bit i of the well-known calculating unit. The elementary cell includes several registers for several input operands, of which only two register cells 110, 112 are shown in FIG. 4. An elementary cell further includes adding means 114 and a register cell for a result which, in FIG. 4, is designated by 116. It follows from the relatively high number of components within an elementary cell, as can be seen from FIG. 4, that such an elementary cell, in its practical realization, has a relatively low height h but a relatively large width d. Due to the fact that 660 such elementary cells must be stacked one above the other, a narrow high tower nevertheless results. From a manufacturing point of view, chips which have a squared form to the largest extent possible are sought so that the narrow high tower can be divided into several small stacks which are placed next to one another, wherein every other stack is upside down. The information a stack needs from the previous stack is transmitted to the neighboring stacks at the upper and lower sides of the stack.
Certain cryptographic algorithms can be processed in parallel by means of two parallel operating calculating units in order to reduce the processing time. Certain algorithms, when they are, for example, iterative, require that the contents in the result register of the one calculating unit is loaded into an operands register of the other calculating unit.
Such a situation is illustrated in FIG. 3. In FIG. 3, a first long number calculating unit 91 and a second long number calculating unit 92 are illustrated. Each calculating unit includes a number of elementary cells 90, wherein each elementary cell can be constructed as is shown in FIG. 4. The number of elementary cells in each long number calculating unit is the same and equals n. Depending on the case of application calculating units have different lengths. The calculating unit described in DE 36 312 992 C2 has a length of 660 bits. If two such encrypting operations were to be executed in parallel, two 660-bits long number calculating units would be used.
For elliptical curve cryptography, a sufficient security is already obtained when secret keys having a length of, for example, 160 bits are used. Such a calculating unit would thus have to have a minimal width of 160 bits. For RSA cryptosystems, there are implementations with a high-security level, in which the module has 1024 digits. High-security RSA systems, however, have modules with 2048 digits. For parallel applications, for example two 1024-bits calculating units or two 2048-bits calculating units would have to be connected in parallel.
In order to load the contents in a result register of, for example, the long number calculating unit 1 (91 in FIG. 3) into an input operand register of the long number calculating unit 2 (reference numeral 92 in FIG. 3), a first bus interface (bus IF), a second bus interface (bus IF) 94 and a bus 95 having a width of, for example, 32 bits could be used. The bus interface 93 would thus include the block-by-block read out of 32-bits blocks from the long number calculating unit 1. Each 32-bits block is then transmitted to the bus interface 94 via the bus 95 one after the other, wherein the bus interface 94 causes the incoming 32-bits blocks to be loaded into the correct elementary cells of the long number calculating unit. For a 660-bits calculating unit, more than 20 cycles are required for this, each cycle including the following steps: addressing 32 elementary cells in the source long number calculating unit, reading out the 32 elementary cells in the source long number calculating unit, transmitting the 32 bits via the bus, addressing the 32 elementary cells in the destination long number calculating unit and storing the 32 bits into the addressed 32 elementary cells of the long number calculating unit.
The access of a calculating unit to a register of the other calculating unit thus takes place by a previous explicit exchange of operands via the bus system to which the two calculating units are connected. As a standard, this bus has a width of 32 bits. It can, however, also have a width of only 8 bits, depending on the system present. The exchange thus takes a long time in long number calculating units and in particular in serial-parallel long number calculating units. In addition, a security problem often arises since the data transfer can, for example, be seen in the current profile.
It is an object of the present invention to provide a more efficient and securer calculating unit.