1. Field of the Invention
The present invention relates to cryptography techniques and, in particular, to the architecture of cryptography processors used for cryptographic applications.
2. Description of the Related Art
With an increasingly widespread use of cashless money transfer, electronic data transmission via public networks, exchange of credit card numbers via public networks and, generally speaking, the use of so-called smart cards for the purposes of payment, identification or access, there is an increasing need for cryptography techniques. Cryptography techniques include, on the one hand, cryptography algorithms and, on the other hand, suitable processor solutions which execute the calculations specified by the cryptography algorithms. In the past, when cryptography algorithms used to be executed by means of general-purpose calculators, the cost, the calculation time requirement and security with regard to diverse external attacks did not play as decisive a roll as nowadays, where cryptographic algorithms are increasingly executed on chip cards or special security ICs, for which there are specific requirements. Thus, on the one hand, such smart cards must be available in a cost-efficient manner, as they are mass products, on the other hand, however, they must exhibit high security towards external attacks, as they are completely in the control of the potential attacker.
In addition, cryptographic processors must provide considerable calculating capacity, particularly since the security of many cryptographic algorithms, such as the well-known RSA algorithm, fundamentally depends on the length of the keys used. In other words, this means that with an increasing length of the numbers to be processed, security increases as well, since an attack based on trying out all possibilities is rendered impossible for reasons concerning calculation time.
Expressed in figures, this means that cryptography processors must handle integers which may have a length of, say, 1024 bits, 2048 bits or perhaps even more. As a comparison, processors in a typical PC process 32-bit or 64-bit integers.
High calculating expenditure, however, also signifies a large amount of calculating time, so that the essential requirement on cryptography processors is, at the same time, to achieve a high calculating throughput, so that, for example, an identification, access to a building, a payment transaction or a credit card transfer does not take many minutes, which would be extremely detrimental to market acceptance.
In summary, therefore, it can be stated that cryptography processors must be secure, fast and, therefore, extremely high-performing.
One possibility of increasing the throughput through a processor is to provide a central processing unit with one or several co-processors which work in parallel, such as is the case, for instance, in modern PCs or in modern graphics cards. Such a scenario is depicted in FIG. 7. FIG. 7 depicts a computer circuit board 800 on which a CPU 802, a random-access memory (RAM) 804, a first co-processor 806, a second co-processor 808 as well as a third co-processor 810 are located. CPU 802 is connected with the three co-processors 806, 808 and 810 via a bus 812. In addition, each co-processor may be provided with its own memory, which serves only for operations of the co-processor, i.e., a memory 1 814 for co-processor 1, a memory 2 816 for co-processor 2 as well as a memory 3 818 for co-processor 3.
Furthermore, each chip arranged on the computer circuit board 800 depicted in FIG. 7 is supplied, via its own current and/or voltage supply terminal I1 to I8, with the electrical power required for the electronic components within the individual elements to function. Alternatively, only one power supply may exist for the circuit board, which power supply is then distributed to the individual chips on the circuit board across the circuit board. In this case, however, the supply lines leading to the individual chips are available to an attacker.
The concept for typical computer applications depicted in FIG. 7 is unsuitable for cryptography processors, for several reasons. On the one hand, all elements for short-integer arithmetic are listed, whereas cryptography processors must carry out long-integer arithmetic operations.
In addition, each chip on the computer circuit board 800 has its own current and/or power access, which can readily be accessed by an attacker, so as to tap off power profiles or current profiles as a function of time. Tapping off power profiles as a function of time is the basis for a multitude of efficient attacks on cryptography processors. Further background details and/or a more detailed description of various attacks on cryptography processors are given in “Information Leakage Attacks Against Smart Card Implementations of Cryptographic Algorithms and Counter-measures”, Hess et al., Eurosmart Security Conference, 13 to 15 Jun. 2000. As countermeasures, implementations have been proposed which are based on the fact that different operations always require the same amount of time, so that an attacker cannot determine from a power profile whether the crypto processor has executed a multiplication, an addition or something else.
“Design of Long Integer Arithmetical units for Public-Key Algorithms”, Hess, et al., Eurosmart Security Conference, Jun. 13, 15, 2000, describes in detail different calculating operations which must be executable by cryptography processors. In particular, modular multiplication, methods for modular reduction as well as the so-called ZDN method, which is set out in the German Patent DE 36 31 992 C2, are described.
The ZDN method is based on a serial/parallel architecture using look-ahead algorithms, which are executable in parallel, for multiplication and modular reduction, so as to transform multiplication of two binary numbers to an iterative 3-operands addition using look-ahead parameters for multiplication and modular reduction. To this end, modular multiplication is broken down into a serial calculation of partial products. At the outset of the iteration, two partial products are formed and, thereafter, added together while considering modular reduction, so as to obtain an intermediate result. Thereafter, a further partial product is formed and, again, added to the intermediate result while considering modular reduction. This iteration is continued until all digits of the multiplier have been processed. For the three-operands addition, a crypto coprocessor includes an adding unit which carries out, in a current iteration step, the summation of a new partial product to the intermediate result of the preceding iteration step.
Thus, each co-processor of FIG. 7 could be provided with its own ZDN unit, so as to execute several modular multiplications in parallel, so as to increase throughput for particular applications. However, this solution would be unsuccessful due to the fact that an attacker might be able to determine the current profile of each individual chip, so that an increase in throughput has, indeed, been achieved, however, at the expense of the security of the cryptography computer.
The technical publication “A Design for Modular Exponentiation Coprocessor In Mobile Telecommunication Terminals” Kato T Et al., Cryptographic Hardware And Embedded Systems, 2ND International Workshop, 17, 18, Aug. 2000, Proceedings, Lecture Notes in Computer Science, pages 215-228) shows a design for a co-processor for carrying out the modular exponentiation in mobile telecommunication terminals. For carrying out the modular exponentiation, the so-called square-and-multiply algorithm is used. A Left-to-Right circuit (LRC) and Right-to-Left circuit (RLC) will be examined. In particular, it is proposed to select a unit for modular squaring and a unit for modular multiplication by a common control unit. Moreover, a further modular multiplication circuit in addition to a further modular squaring circuit is provided, which are also connected by a common control. Alternatively, it is proposed to control three modular multiplication units by a common control. The three modular multiplication units operate in parallel, with two multiplication units performing a right-to-left calculation, while the third multiplication unit performs a dummy calculation. Alternatively, two multiplication units perform a left-to-right calculation, while the third multiplication unit performs a dummy calculation. Again, alternatively, a special algorithm is performed by two multiplication units, while the third multiplication unit performs a dummy calculation.
The technical publication “High-Speed RSA Hardware Based On Barret's Modular Reduction Method”, J. Groβschaedl, 2ND International Workshop, Ches 2000, Proceedings, Lecture Notes In Computer Science, Vol. 1965, 17 Aug. 2000, pages 191-203, discloses an RSA crypto-chip with an interface/control unit, a multiplier core, and an E/O register with 1056 bits. The multiplier core is a sub-parallel multiplier with diverse registers, a carry-save adder, two carry-lookahead adders in addition to an accumulator and further elements. The interface/control unit provides a 16-bit standard microcontroller interface, via which a data exchange and a command call take place. The control unit controls the multiplier core. The register supports a 16-bit data transfer with the interface unit and a 1056-bit parallel data exchange with the multiplier core.