1. Field of the Invention
The present invention relates to calculating units and, in particular, to calculating units for processing an operand having a number of positions, wherein the calculating unit has a number of bit slices equal to an mth part of the number of positions of the operand. In other words, the present invention relates to a calculating unit for processing one or more operands, wherein the calculating unit has less bit slices than the one or more operands.
2. Description of the Related Art
FIG. 4 shows a known calculating unit. The calculating unit includes a calculating unit controller 40 and a plurality of bit slices 41a, 41b, 41c, wherein a total of N bit slices and/or bit slice means is illustrated in FIG. 4. The calculating unit shown in sections in FIG. 4 further includes an external calculating unit bus 42 designed to load data into each bit slice. In the embodiment, also data passed from one bit slice into the next bit slice, i.e. for example carry bits, may be transmitted from one bit slice to the next higher bit slice via slice-internal communication lines. Alternatively, however, this may also be done via the external bus 42.
Only schematically, FIG. 4 shows an operand 43 which has a number of m times N positions, i.e. whose number of positions is greater than the number N of bit slices at the beginning. In the special case shown in FIG. 4, the factor m is greater than or equal to 2, so that the number of positions of the operand 43 is equal to twice, three times, four times, . . . , M times the number of bit slices N of the calculating unit shown in FIG. 4.
FIG. 5 shows a more detailed illustration of the bit slice i of FIG. 4. In the embodiment shown in FIG. 5, the calculating unit is designed to process a number k of operands, wherein FIG. 5 shows only the register area 50a for the first operand and the register area 50b for the second operand for the bit slice i. Further register areas for further operands are indicated at 50c. 
A bit slice further includes a logic element 51 typically consisting of a number of logic gates to create an adder cell for adding a plurality of operands, particularly in long number calculating units. For this, the logic element includes the gates, necessary for an adder, for generating a sum bit and a carry bit from the two input operand bits and a carry bit from the next lower bit slice, i.e. from the bit slice with the ordinal number i-1. For a more detailed description of bit slices and the way they are stacked one over the other to obtain a calculating unit, see DE 3631992 C1.
The elements 50a, 50b, 50c, 51 of a bit slice i are connected to each other via a local communication bus for the bit slice i, denoted 52 in FIG. 5. Specifically, each register unit includes a connection to the communication bus 52, wherein the communication bus 52 may extend to the external bus 42 or may be connected to the external bus 42 via slice on/off circuits. In FIG. 5, the bit slice i is further shown controllable via the control means 40, already illustrated with respect to FIG. 4.
As discussed above, the calculating unit shown in FIG. 4 serves for processing an operand or several operands. If only one operand is processed, this could, for example, be an inversion of the operand etc. Typically, however, two or more operands are processed with each other, such as two operands when there is simply an addition, or such as three operands which are added to implement an efficient execution of the modular exponentiation, particularly for cryptographic applications such as the RSA cryptosystem, as disclosed in DE 3631992 C2. In this case, the logic element 51 of FIG. 5 consists of a bitwise three operand adder consisting of a half adder and a downstream full adder for each bit slice.
If the length of the calculating unit, i.e. the number of bit slices N, is greater than or equal to the number of positions of the operand, the operand is typically loaded into the corresponding calculating unit register cells via the external bus 42 in an input cycle. This means that the least significant bit is fed into a register cell of the bit slice 41a, that the next higher bit is fed into the bit slice 41b, that again the next higher bit is fed into the bit slice 41c, and that the ith bit is fed into the bit slice i.
If the operand does not have more positions, i.e. if the number of positions of the operand is less than or equal to the number of bit slices, an operation may be performed in a cycle, wherein the result of the operation may be stored in a distinct result register of a bit slice or in the register in which the original value had been, if the original value is no longer needed. The result of the operation may then be output for the external bus 42 of FIG. 4, and may, for example, be stored in an external memory. Depending on the request, however, the result may be used again to be supplied to the logic elements again in a next calculating step via the respective local communication busses of the bit slices, in order to perform a new calculation again. This control is performed by the calculating unit control means 40 which is in operative connection to each bit slice to perform corresponding register loading functionalities.
In the above embodiments, each register block 50a, 50b, 50c was assumed to have exactly one single register cell. If for example, in such a calculating unit in which each bit slice has a register cell for each operand, an addition of two operands was performed which both have more positions than there are bit slices, first a first subgroup of positions of the operand would be fed into the adder via the external bus to calculate the first lower positions of the result. These would then be output, wherein the carry of the highest bit slice is stored. Then the next portion of positions would be fed into the bit slice register cells via the external bus to then calculate the next portion of sum bits using the carry bit just stored for the lowest bit slice. This procedure may be repeated until all positions of the operand have been processed.
This procedure is disadvantageous particularly in that data have to be transmitted via the external bus after each calculating unit cycle.
To overcome this disadvantage, the concept described in FIG. 6 was used, in which each bit slice has a plurality of register cells, in particular m register cells. In particular, FIG. 6 shows the register section of a bit slice i of FIG. 5, i.e. an enlarged illustration of the blocks 50a, 50b including control means 40 and communication bus 52. In the example shown in FIG. 6, the communication bus 52 includes an out communication bus 52a and a back communication bus 52b. Specifically, each register block 50a, 50b now includes a plurality of M register cells 61a-61d connected in series with respect to register block 50a and a plurality of m register cells. 62a, 62b, 62c, 62d connected in series with respect to the register block Sob. The register cells 61a, 61b therefore are connected in series, as shown in FIG. 6, such that the output of one register cell is connected to an input of the next register cell.
For latching data into the register cells, the data storage is performed via the input bus internal to the calculating unit so that first the ith bit of a certain operand which is fed into the register block 50a is supplied to the register cell 61d. For the following, the calculating unit is assumed to have N bit slices. In a next step, the operand N+i is then supplied from the external bus to the input bus 52b. This operand for the position N+i is now fed into the register cell 61d, wherein now, however, the operand for the position i up to now stored in the register cell m is passed on “upwards” into the register cell 3.
For latching in the next bit for the position i+2N, this bit is again supplied from outside and supplied to the register cell m 61d. First, however, the current value stored in the register cell m is passed on to the higher register cell 3. Prior to this, however, the value in the register cell 3 is shifted up into the registration 2.
Latching in the position i+3N of the operand is performed as follows. The value currently stored in the register cell 61b is shifted up into the register cell 61a. The value currently stored in the register cell 61c is shifted up into the register cell 61b. The value currently stored in the register cell 61d is shifted up into the register cell 61c. Finally, the new value for the position i+3N to be latched in is inserted into the register cell m.
A corresponding procedure occurs when values that have been calculated as intermediate result of the last step are supplied from the logic element i via the bus 52b. They are again fed “from bottom to top” into the serially connected register cells. Accordingly, when register cell contents are to be transported, for example, to the logic element or to the external bus, i.e. “out”, via the output bus 52a, this is done sequentially.
This procedure is particularly advantageous when the number of register cells is relatively large, i.e., in particular, greater than 2 or 3. This means that very large numbers may be processed with relatively small calculating units. Specifically, consider the exemplary case that operands of a size of up to 1,120 bits are to be processed and that the calculating unit has N=280 bit slices. In such a case, each register block in the bit slice has four register cells connected serially one after the other.
The loading may be performed serially in a distinct loading cycle, while other bits are already processed in the logic element. On the other hand, it is necessary that, to achieve fast operation, new bits are written to all register cells substantially at the same time. In other words, this means that at one time all inputs/outputs of all register cells are opened, as it were, so that the “shifting” of the register cell contents may be performed.
The register bits are typically realized as latches, wherein, for each of the number of m subcycles necessary to process an addition, the concerned register bit is stored in a buffer and all remaining m−1 bits are passed on successively through the sequentially arranged latches, as discussed with respect to FIG. 6. Due to the fact that latching is required (typically somewhere on the path of the output bus 52a or on the path of the input bus 52b), this implementation is problematic, particularly with respect to timing. Furthermore, controlling is complex which represents a problem particularly in that only hardwired control options may be used within the bit slices.
A further disadvantage of considerable importance is the fact that significant cross currents are generated when the register cells are operated in the described sequential way. Particularly in CMOS circuits, which are typically used, no or little cross current is generated in the holding state, i.e. when there is no change of a register. However, if a register changes its state, noticeable cross currents are generated which have to be produced by a chip-internal current source. As mentioned above, in the worst case all m register cells of the circuit topology shown in FIG. 6 may change their states. This means that a current supply of a chip has to produce a significant current equal to the sum of all cross currents. Since typical current supplies are implemented via a voltage source with a corresponding voltage regulator, an extraordinarily strong voltage regulation is required for this case of high cross currents. If it is not provided, a voltage drop will occur on the chip. However, such a generously designed voltage regulator which is able to counteract such a voltage drop requires a lot of chip area on the one hand and is more complex in its design as compared to simpler voltage regulators on the other hand.
In summary, the concept shown in FIG. 6 thus results in a more expensive chip, because additional chip area is required in view of control and its complexity, and because expensive voltage regulators are also required on the chip in view of the high cross currents which may potentially occur.
Particularly in the case of mass applications, such as chips for chip cards, even minor price differences may have the consequence that one product survives, while another product does not achieve market acceptance, due to the large produced numbers.