1. Field of the Invention
The present invention relates to an information processing unit, more particularly to a method of encrypting and decrypting data to be processed in an information processing unit, and still more particularly to a method of encrypting and decrypting data used within an information processing unit.
2. Description of the Background
Most IC (integrated circuit) cards and household electrical information appliances are equipped with information processing units based on a common structure shown in FIG. 1. These information processing units have only a limited amount of processing power and memory space, such as an 8-bit central processing unit (CPU) E101, on the order of 10 KB of volatile memory (RAM E102), and some amount of nonvolatile memory (EEPROM E103 and ROM E104). Network node devices' and routers, however, which generally use cryptographic processing, have a larger amount of processing power and memory space, such as a 32-bit CPU and several hundred megabytes of volatile memory (RAM). The latter also have fewer limitations on system size and maximum power consumption than the former, which cannot increase their processing capabilities by boosting the clock rate of the processor or by adding external hardware.
Adding computer hardware to a variety of electrical information appliances and systems is becoming pervasive, and, accordingly, the storage and use of various information and the exchange of data between computers has come to be performed more frequently. It is increasingly necessary, therefore, to process data that requires protection against leakage to the outside during computer-to-computer data exchanges, such as electronic money, billing information, and private information. Cryptographic techniques are indispensable for processing such information in secrecy.
Typical of the cryptographic systems now being used are DES (Data Encryption Standard)(National Bureau of Standards, Data Encryption Standard, U.S. Department of Commerce, FIPS pub. 46, January 1977) and RSA (named after its inventors, Rivest, Shamir, and Adleman)(R. L. Rivest, A. Shamir, and L. M. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Communications of the ACM (2) 21 (1978), 120-126). The former is a secret-key cryptosystem, and the latter is a public-key cryptosystem. A secret-key cryptosystem uses a common secret key for encryption and decryption and is also referred to as a common-key cryptosystem or a symmetric-key cryptosystem. On the other hand, a public-key cryptosystem uses different keys for encryption and decryption and is also referred to as an asymmetric-key cryptosystem. In general, the cipher used in a secret-key cryptosystem, involves the combination of 64 to 128 bit input data with 64 to 128 key bits by substitution of bit relationships and permutation of bit positions, concurrently performed a plurality of times.
The only calculations required in a secret-key cryptosystem are bit operations and reference to relatively small tables, making it possible for even an information processing unit with comparatively modest capabilities to complete the processing in a short time, such as a few milliseconds.
The keys in a public-key cryptosystem are restricted by mathematical relationships that must hold between the encryption and decryption keys, so the keys generated in these systems are likely to be long, such as 1024 bits. In addition, extensive numerical calculations are performed, taking a few hundred milliseconds for an information processing unit with comparatively small processing power, even if a co-processor is used.
A secret-key cryptosystem provides a key shared by the sender and receiver in advance, thereby enabling faster processing; a public-key cryptosystem lays the encryption key open to the public for one side to use to encrypt data while the other side decrypts by using a secret decryption key. This system achieves greater security, but it requires more time for calculation than a secret-key cryptosystem. Therefore, secret-key cryptosystems are often used for cryptographic processing of data used within the same information processing unit, while public-key cryptosystems are used for cryptographic processing of data exchanged among different information processing units.
An information processing unit which stores secret information in the nonvolatile memory device may use a secret-key cryptosystem to encrypt the information and maintain the encrypted information, in case the memory device is taken out and physically analyzed while the system is powered off. Secret information can be kept secure in this way by having the user memorize the cryptographic key in a scrambled form that cannot be unscrambled easily by a third party. Methods using DES and other secret-key cryptosystems are also implemented by disclosed software, such as PGP (Pretty Good Privacy), as cryptographic algorithms for data stored in external storage devices.
For security, however, encryption only of data to be stored in external storage devices is inadequate, it is also necessary to keep data secret within the system unit that performs encryption and decryption of the data. The present invention provides hardware that achieves this goal with comparatively small resources, including just a few registers for holding key data, calculation equipment, and an information processing unit with comparatively small processing power, the small hardware scale also enabling faster processing. An information processing unit according to the present invention can perform processing that is secure against information leakage. Attention was drawn to this problem by a cryptographic analysis method known as DPA (Differential Power Analysis) (See, P. Kocher, J. Jaffe, and B. Junn, Differential Power Analysis, Advances in Cryptology CYPT'99, Lecture Note in Computer Science 1666, Springer-Verlag, pp388-397, 1999), presented by P. Kocher in 1998. The disclosure of this method showed the necessity for the protection not only of data stored in external storage devices but also data being operated on in arithmetic and logic units. DPA is an analysis technique that observes how current consumption varies with the data being operated on to determine the state of a certain bit. The essence of this analysis technique lies in utilization of the correlation between the data being processed by an information processing unit and the corresponding current consumption.
The current consumption of an information processing unit varies with the data being processed. The data to be processed is characterized by two parameters: one indicating its notation and the other indicating its location, such as binary notation in computers and the address in a CPU address space. Conventional information processing units present processed data in a combination of inputs and outputs. Because of the properties of the CMOS chips used in integrated circuits, current consumption differs depending on whether a “1” or a “0” is being processed.
Suppose the current consumption when data x located at address a is processed is expressed in the form c(x, a); and the number of “1's” of data x in binary notation, referred to as its Hamming weight, is expressed as H(x). If the bus width of the information processing unit is w bits, obviously 0≦H(x)≦w. Note that binary notation is also used in accessing address a. Suppose also that the current consumption in processing a “1” is d1, and the current consumption in processing a “0” is d0. If, for example, the widths of the data bus and address bus of the information processing unit are 8 bits, thenC(x, a)=(H(x)+H(a))d1+((8−H(x))+(8−H(a)))d0+α+βwhere α is the power consumption added when a specific part of the information processing unit operates, and β is noise caused by the measurement equipment.
First, α and β, which are elements independent of the data, need to be eliminated. If an information processing unit is kept in a given state and processes two different items of data, x0 and x1, the values of α0 and α1 in the following equations indicating current consumption in processing x0 and x1, respectively, will be the same.C(x0, a0)=(H(x0)+H(a0))d1+((8−H(x0))+(8−H(a0)))d0+α0+β0andC(x1, a1)=(H(x1)+H(a1))d1+((8−H(x1))+(8−H(a1)))d0+α1+β1
Then, if the noise terms β0 and β1 can be eliminated, it is possible to compare data x0 with data x1 by comparing c(x0, a0) and c(x1, a1). Since β is a noise quantity, its mean value is 0. Therefore, β can be eliminated by calculating the mean value of n current consumption measurements c[0] to c[n−1], by dividing their sum by n, if n is sufficiently large. If β0 and β1 are eliminated in this way, thenc(x0, a0)−c(x1, a1)=((H(x0)+H(a))(H(x1)+H(a1)))d1+((H(x1)+H(a1))−(H(x0)+h(a0)))d0and if the two items of data are placed at the same address, that is, a0=a1, thenc(x0, a0)−c(x1, a1)=(H(x0)−H(x1))d1−(H(x0)−H(x1))d0
Furthermore, if d=d1−d0, the equation above can be reduced to C(x0, a0)−c(x1, a1)=(H(x0)−H(x1))d
It is impractical to track and know which part of the information processing unit is operating at some point in time, so it is difficult to obtain α. It is easier, however, to find the difference d, instead of the absolute values of d1 and d0, by taking differences of data, with a becoming an offset. Consequently, based on this information, it is possible to know the Hamming weight difference between data x0 and x1 from the expression c(x0, a0)−c(x1, a1). Before actual data are inferred, for an 8-bit bus it is useful to check the power consumption of nine data items with Hamming weights of 0 to 8 in advance, for example, as a basic test. As the nine data items, suppose 0, 1, 3, 7, 15, 31, 63, 127, and 255 are used, which are expressed as b′0, b′1, b′11, b′111, b′1111, b′11111, b′111111, b′1111111, and b′11111111 in binary notation, and have Hamming weights 0, 1, 2, 3, 4, 5, 6, 7, and 8. If the difference between the basic test data and the obtained data is calculated, it is possible to obtain data with a difference of 0 in Hamming weight from the data to be processed.
Suppose the current consumption observed during a calculation with the data being processed matches data 1 obtained from the basic test. This means the Hamming weight of the data to be processed is 1. 8-bit numeric values with Hamming weight 1 are 1, 2, 4, 8, 16, 32, 64, and 128, so it is possible to know that the value of the data being processed is one of these values. Depending on the architecture of the information processing technology, there are cases in which current consumption may differ depending on the bit positions (0 to 7) in an 8-bit bus. In this case, it is possible to uniquely determine the data being processed by obtaining all 28=256 basic test data items in advance and comparing them with the data being processed, one by one. If data can be obtained in this way in the key operation part of a cryptographic processing unit, ciphers can be easily decrypted.
The elimination of the value of noise β described above then becomes a problem. In general, an information processing unit is caused to operate on the same data n times, where n is an adequately large number, so the current consumption patterns c[0] to c[n−1] are obtained, and their mean value can be calculated. The mean value of β is 0, and accordingly β can be eliminated. It should be noted that information that tends not to change, such as private information, is more prone to leakage when noise is eliminated by averaging data measured a plurality of times. The value of n cannot be defined easily because it depends on the noise source of the information processing unit and the accuracy of the measurement equipment. However, if the actual value of n cannot be derived, it is permissible simply to keep repeating the measurement until the noise is eliminated. A possible countermeasure against such data analysis would be to disable operations repeated a number of times adequate to eliminate noise through averaging.
Since information processing units operate on electric current, it is impractical to eliminate the correlation between the information being processed and the current consumption. Therefore, data being processed must be encrypted to make it impossible for the analyzers to infer the contents of data. The DES cryptosystem described above and other such cryptographic algorithms can be used, but they take too long, and require too many hardware resources such as registers and volatile memory space for data encryption, to be suitable for use in units with comparatively small processing power which must encrypt and decrypt data on demand. Encryption/decryption units that can perform cryptographic processing with minimal hardware resources and processing time are required.