Embodiments of this invention are in the field of digital logic, and are more specifically directed to programmable logic suitable for use in encryption and decryption according to the Kasumi cipher.
High-speed data communication services are now commonly available for mobile telephone devices. For example, the combination of the well-known “3G” (3rd Generation) mobile services with the increased computational capacity of modern logic circuits has enabled modern high-performance mobile telephones to provide full wireless Internet access (i.e., as opposed to being limited to “mobile” webpages), and wireless downloading and enjoyment of multimedia content.
The 3G mobile wireless services are commonly carried out under a set of standards promulgated by The 3rd Generation Partnership Project (“3GPP”), which is an initiative of the European Telecommunications Standards Institute (“ETSI”). These standards and technical specifications for 3G wireless services include normative encryption and decryption algorithms for confidentiality and integrity (i.e., authentication), such algorithms referred to as the “f8” and “f9” algorithms, respectively. These normative algorithms are described in Universal Mobile Telecommunications System (UMTS): Specification of the 3GPP confidentiality and integrity algorithms; Document 1: f8 and f9 specification, Version 7.0.0 Release 7, ETSI TS 135 201 V7.0.0 (ETSI, 2007), incorporated herein by this reference. As stated in that specification, encryption and decryption under the f8 (confidentiality) and f9 (integrity) algorithms utilize the “Kasumi” block cipher. As known in the art, block ciphers are encryption approaches that encrypt a message by transforming a fixed-length data block of a given size, into a fixed-length block of that same size, by applying a key. The key is a specific data block of a particular size, the contents of which are known to the encrypting party and to the decrypting part. The Kasumi block cipher, in its context as applied in the 3GPP f8 and f9 algorithms, is described in Universal Mobile Telecommunications System (UMTS): Specification of the 3GPP confidentiality and integrity algorithms; Document 2: Kasumi specification, Version 7.0.0 Release 7, ETSI TS 135 202 V7.0.0 (ETSI, 2007), incorporated herein by this reference.
In a general sense, the Kasumi cipher is of the class of block ciphers referred to as “Feistel” ciphers. Feistel ciphers are a class of iterated block ciphers in which the encrypted “text” is calculated from its “plaintext” by repeatedly applying the same transformation. In general, Feistel ciphers break the data being encrypted into two halves, and break the “key” into subkeys. In each but the last one of multiple “rounds”, the appropriate transformation function is applied to one half of the input block using a subkey, with the result exclusive-ORed with the other half, and the two halves of the input block are then swapped. The last “round” applies the same transformation, but without the swapping of the end result. Decryption follows the same approach, structurally, but the subkeys are applied in reverse order from the order applied in encryption. The f8 and f9 algorithms apply the Kasumi cipher within different higher-level algorithms from one another.
The data flow of the f8 confidentiality algorithm is illustrated in FIG. 1a. According to the f8 3GPP specification, control word 2 includes various control information including such information as the bearer and direction of the communication, and optional information including the length of the payload bitstream. The f8 algorithm produces a keystream KS from control word 2 that is applied to the input bitstream IBS, which is the input payload data to be encrypted. Control word 2 is applied to Kasumi algorithm instance 50 along with the confidentiality key CK, exclusive-OR modified by a key modifier KM. The output of this first Kasumi algorithm instance 50 is stored in sixty-four bit register A. Keystream KS is then generated in sixty-four bit blocks from the contents of register A, by separate Kasumi algorithm instances 51 through 5N. Kasumi algorithm instance 51 exclusive-ORs the contents of register A with a block count value BLKCNT=0, to produce a first sixty-four bit block of the output keystream KS. Subsequent keystream KS blocks are recursively produced from the exclusive-OR of the previous keystream KS block with the result of the exclusive-OR function of the contents of register A block with the corresponding block count value BLKCNT:KSk=KASUMI[A⊕BLKCNT⊕KSk-1]where the index k is the block of the output keystream. In function 7, blocks of the keystream KS are each bit-wise exclusive-ORed with a corresponding block of input bitstream IBS to produce the eventual output bitstream OBS.
FIG. 1b illustrates the data flow of the f9 integrity function according to the 3GPP specification. According to this conventional algorithm, input message 6 includes control information (e.g., the fields COUNT, FRESH, DIRECTION, etc.) and also the payload data (i.e., the field MESSAGE). Input message 6 is parsed into blocks of sixty-four bits each, and the blocks are applied to a corresponding Kasumi algorithm instance 9, along with a corresponding integrity key IK. The output of first Kasumi instance 90 based on first block PS0 of input message 6 is forwarded to an exclusive-OR function 101, for combination with a next block PS1 of input message 6 prior to application to Kasumi instance 91; this output is also bit-wise exclusive-ORed with the output of Kasumi instance 91, by function 121; the output of exclusive-OR function 121 is then applied to next exclusive-OR function 122, for combination with the output of Kasumi instance 92, and so on. This interconnection of Kasumi instances 9 with corresponding exclusive-OR functions 10, 12 continues to the final block PSBLOCKS-1 of input message 6 and its Kasumi instance 9N. The output of final exclusive-OR function 12N associated with Kasumi instance 9N is applied to Kasumi instance 11, along with the exclusive-OR of integrity key IK with key modifier KM, to produce the output message authentication code MAC-I, which is compared against an expected value to determine if the integrity of the message is valid.
Each Kasumi instance 5, 9 in the f8 and f9 algorithms is an instance of the well-known Kasumi cipher. FIG. 1c illustrates an example of the Kasumi algorithm, in the form of an eight “round” cipher; as known in the art, the number of such rounds can vary. In the Kasumi instance illustrated in FIG. 1c, a sixty-four bit input word 15 is split into two thirty-two bit halves, namely left half L0 and right half R0. In the first round, left half LO is applied to FL function FL1, for combination with subkey KL1, and then to FO function FO1, for combination with subkeys KO1, KI1. The output of FO function F01 is bit-wise exclusive-ORed with right half R0 of input word 15, and the result applied to FO function FO2 in the second round. FO function FO2 combines the result of the first round with subkeys KO2, KI2, and its result is then combined with subkey FL2 by FL function FL2. The output of FL function FL2 is exclusive-ORed with left half LO of input word 15, and the result applied to the input of FL function FL3 to begin the third round. This operation continues for eight rounds, such that output word 18 is constructed as the concatenation of left half result L8, which is the result of the exclusive-OR of the output of the sixth round and the output of FL function FL8 of the last round, and right half result R8, which is the result of the exclusive-OR of the output of the fifth round and the output of FO function F07.
FIG. 1d illustrates the conventional operation of the FO function, which is performed in each of the eight rounds of the algorithm of FIG. 1c. Thirty-two bit input word 30 is treated by the FO function as two sixteen-bit halves. Subkeys KO, K1 are forty-eight bit subkeys that the FO function subdivides into three sixteen-bit subkeys. The left-hand half of input word 30 is bit-wise exclusive-ORed (XOR function 321) with subkey KO1, and then applied to FI function 311, along with subkey KI1; The output of FI function 311 is exclusive-ORed with the right-hand half of input word 30 by exclusive-OR function 322. The output of XOR function 322 is exclusive-ORed with subkey KO3 (XOR function 324), and the result applied to FI function 313 along with subkey KI3. On the right-hand side, the right-hand half of input word 30 is exclusive-ORed with subkey KO2 (XOR function 322), and the result applied to FI function 312 with subkey KI2. The output of FI function 312 is exclusive-ORed (XOR function 326) with the output of exclusive-OR function 322. Output word 33 is the concatenation of the output of XOR function 326, as its left half, and the exclusive-OR (XOR function 325) of the output of FI function 313 and the output of XOR function 326, the result being the right half of output word 33.
FIG. 1e illustrates the conventional operation of the FL function, as performed within the FO function of FIG. 1d. The FL function splits a sixteen-bit input word 35 into a more significant nine-bit portion and a less significant seven-bit portion. The nine-bit portion is applied to look-up table S9, which returns a pseudo-random nine-bit value that is bit-wise exclusive-ORed with the seven bit portion of input word 35 (two leading zeros added), by XOR function 370. This result is exclusive-ORed with nine-bit subkey K12 (XOR function 371), and the result applied to another instance of nine-bit look-up table S9. The seven-bit portion of input word 35 is applied to look-up table S7 to return a seven-bit pseudo-random number that is exclusive-ORed with a seven-bit subkey KI1 (XOR function 373). This result is exclusive-ORed with the least-significant seven bits of the output of XOR function 370, and the seven-bit result applied to look-up table S7. The output of the second instance of look-up table S7 (with two leading zeros added) is exclusive-ORed with the output of the second instance of look-up table S9 (XOR function 372), and the result becomes the least-significant nine-bits of output word 45. Conversely, the seven least significant bits of the output of XOR function 372 are exclusive-ORed with the output of the second instance of look-up table S7 (XOR function 375), with the result becoming the most-significant seven bits of output word 45.
FIG. 1f illustrates the conventional operation of operation of the FO function, which is performed in each of the eight rounds of the algorithm of FIG. 1c. Thirty-two bit input word 20 (corresponding, for example, to one “half” of input word 15 to the overall Kasumi instance, or to the output of one of the rounds thereof) is split into two sixteen-bit halves, as is thirty-two bit subkey KLi. A left-hand half of input word 20 is applied to the input of bitwise AND function, along with a left-hand half KLi,1 of subkey KLi. The output of AND function 21 is rotated left by one bit, by rotate function 23, and applied to one input of exclusive-OR function 24, which performs a bit-wise exclusive-OR with the right-hand half of input word 20. The output of exclusive-OR 24 is applied to the input of bitwise OR function 25, as is the right-hand half KLi,2 of subkey KLi. The output of OR function 25 is rotated left by one bit, by rotate function 27, and is applied to an input of exclusive-OR function 28. XOR function 28 performs a bit-wise exclusive-OR of the output of rotate function 27 and the original left-hand half of input word 20. Output word 22 is the concatenation of the output of exclusive-OR function 28 and the output of exclusive-OR function 24.
It has been observed, according to this invention, that the FL function applied in the Kasumi cipher is quite cumbersome, even using modern high-performance programmable logic. To illustrate this, the FL function of FIG. 1d can be expressed in the C programming language as:
BO = I & KLB1 = B0 >> 16B2 = B1 >> 15B3 = B1 << 1B4 = B3 | B2B5 = B4 {circumflex over ( )} IB6 = B5 & 0x0000FFFFC0 = B6 | KLC1 = B6 & 0x0000FFFFC2 = C1 >> 15C3 = C1 << 1C4 = C3 | C2C5 = C4 << 16C6 = I1 {circumflex over ( )} C5C7 = C6 & 0XFFFF0000O = C7 | B6In this C code, I refers to thirty-two bit input word 20, O refers to thirty-two bit output word 22, and KL refers to thirty-two bit subkey KLi; all operations are thirty-two bit operations in this code. FIG. 1f correlates the operands in the illustrated FL data flow with the B, C register locations of the C code above. As evident from this C code expression of the conventional approach to the FL function, the number of instructions and machine cycles required to execute the FL function is substantial. Even using modern digital signal processors (DSPs), such as the TMS320C64x family of DSPs, the machine time required to perform these operations can be a limiting factor in the efficiency of the overall system, considering that each block of data must be processed through the f8 and f9 algorithms, both at the transmitter end and also at the receiver. One can tabulate the computational effort for one instance of the FL function as follows:
Function (C64xNumber of executionsinstruction)per FL functionBitwise AND4Bitwise OR4Bitwise XOR2Unsigned shift right3Shift left3Total # of instructions16As discussed above, each round of a Kasumi instance includes an FL function, an FO function, and an XOR function. Allowing thirty-three instructions as necessary to perform the combination of the FO and XOR functions, then each Kasumi round will require 33+16=49 instructions to execute. The eight rounds of a Kasumi instance thus requires 49*8=392 instructions, or machine cycles, to perform. And for a message of typical length to be processed by the f8 and f9 algorithms described above, 314 Kasumi instances are executed, which amounts to the execution of 314*392=123,088 instructions. Considering that the Kasumi instructions are in the critical data path in conventional 3G wireless communications, this computational effort is a significant load on the computational capacity of the communications hardware, especially in order to process the signals and corresponding data in real time. In addition, considering that these communications systems are intended for wireless, portable, applications, and because therefore battery life and thus power consumption are of concern, the power required to carry out such a large number of instructions for each data block is less than optimal.