Communication networks and the number of users of such networks continue to increase. On-line sales involving both business-to-business and business to consumer over the Internet continues to proliferate. Additionally, the number of people that are telecommuting continues to grow. Both on-line sales and telecommuting are examples of usage of communication networks that typically involve private and sensitive data that needs to be protected during its transmission across the different communication networks.
Accordingly, security protocols, (e.g., Transport Layer Security (TLS), Secure Sockets Layer (SSL) 3.0, Internet Protocol Security (IPSec), etc.), have been developed to establish secure sessions between remote systems. These security protocols provide a method for remote systems to establish a secure session through message exchange and calculations, thereby allowing sensitive data being transmitted across the different communication networks to have a measure of security and/or untamperability.
Moreover, different ciphers have been developed to allow for these secure communications using such different security protocols. RC4 is a stream cipher having variable size keys using byte-oriented operations. RC4 employs a Substitution (S)-box in its generation of byte values that are subsequently XORed with the plaintext to generate the ciphertext and/or XORed with ciphertext to generate the corresponding plaintext.
However, current approaches for RC4 are limited in their execution speed due to the bottlenecking that occurs while accessing the S-box. In particular, a processing unit executes the RC4 operations in conjunction with a memory that is used to store the S-box needed for the operations. Accordingly, the processing unit accesses the S-box for the different operations. To help illustrate, FIG. 1 includes pseudo-code for a prior art technique for RC4 used in the generation of ciphertext. As shown, FIG. 1 includes pseudo-code 100 that comprises a while loop based on while code statement 102, wherein code statement 104, code statement 106, code statement 108, code statement 110 and code statement 112 are executed within the while loop.
Additionally, FIG. 2 illustrates a waterfall diagram of the process cycles for an implementation of the pseudo-code shown in FIG. 1, according to the prior art. In particular, waterfall diagram 200 illustrates a partially unrolled implementation of the while loop of pseudo-code 100. Pseudo-code 100 is typically executed in a processor wherein the S-box used in conjunction with pseudo-code 100 is stored in a memory coupled to the processor. In FIG. 2, the memory is pipelined, has a single read port, a single write port, and takes three cycles to access. FIG. 2 includes waterfall diagram 200 that illustrates the change in the values of variables of pseudo-code 100 as well as the timing of the accesses to the memory. Waterfall diagram 200 includes column 202, which is a list of the operations of pseudo-code 100. Accordingly as shown, the rows of waterfall diagram 200 correspond to the operations shown in pseudo-code 100. Waterfall diagram 200 also includes column 204, which illustrates the process cycles for the execution of pseudo-code 100.
Columns 206-216 of waterfall diagram 200 illustrate the change in the variables of pseudo-code 100 (i, j, a, b, t and l). Column 218 illustrates a temporary variable (temp) that is used for the storage of temporary results within the processor. The variables i, j and t are used as indexes into the S-box. The variables a and b are used to temporarily store values retrieved from the S-box based on the index variables i and j, respectively. The variable l is used as an index into the data arrays for both the plaintext and the ciphertext generated there from.
Column 220 of waterfall diagram 200 illustrates the reading of the memory (that is coupled to the processor executing pseudo-code 100) for a first cycle, while column 222 illustrates the reading of the memory for a second cycle. Column 224 of waterfall diagram 200 illustrates the writing to the memory.
As will be shown in FIGS. 1 and 2, current approaches for an RC4 operation include the generation of a single byte of ciphertext for one iteration of the while loop (shown in pseudo-code 100). Accordingly, a given iteration of the while loop in FIG. 1 includes (1) a single set of memory accesses from the S-box, (2) a swap of the data retrieved from the memory accesses and (3) the XORing of the plaintext based on the swapped data for the generation of a single byte of ciphertext. Moreover, as will be shown by waterfall diagram 200, this approach includes a number of process cycles wherein there are no accesses made to memory, thereby causing the RC4 operation to be limited in execution speed due to bottlenecking that occurs while accessing the S-box.
Returning to FIG. 1, the LEN variable stores the length of the plaintext or data to be ciphered. The while code statement 102, therefore, executes until the number of bytes equaling the value stored in LEN have been ciphered.
At process cycle zero (shown at column 204 of waterfall diagram 200), the processor executes code statement 104, wherein the value stored in the S-box (S[ ]) at an offset of i+1 is assigned to the variable of a, as part of the memory accessing (as shown in column 220).
At process cycle one, the processor executes a portion of code statement 106, wherein the variable i is incremented (as shown in column 206). Additionally, within process cycle one (as shown in column 222), the second read cycle associated with the read operation of S[i] is also occurring, as two process cycles are needed to retrieve the data from memory.
At process cycle two, the processor executes the other portion of code statement 106, as the value of j+a is assigned to j. In particular, since the read operation of S[i] takes two cycles to complete (after two process cycles), the result of such operation cannot be employed until three cycles later (process cycle two), wherein j equals the value of a added to the current value of j. Accordingly, the result of the memory operation is added to the current value of j to generate a new value of j (as shown in columns 208 and 210).
At process cycle three, the processor executes code statement 108 wherein the value stored in the S-box (S[ ]) at an offset of j is assign to the variable b, for a second memory access (as shown in column 220). In particular, the access of the S-box is available three cycles after a prior access to the S-box (process cycle zero).
At process cycle four, none of the code statements in pseudo-code 100 is executed. Rather, in process cycle four, the second cycle associated with the read operation of S[j] is executed (as shown in column 222).
At process cycle five, (as part of the swapping of the data in the S-box) the processor executes a portion of code statement 110, wherein the value of b is assigned to the location within the S-box having an offset of i (as shown in column 224). Accordingly, as shown in FIG. 2, a write to memory occurs wherein the location within the S-box having an offset of i is set to the current value read from memory (as b has been set to this current value in process cycle four). Additionally, at process cycle five, the variable t is assigned to the addition of the values of a (S[i]) and b (S[j]). As shown in FIG. 2, t is therefore set to the value of S[i] plus the current value read from memory (S[j]). Also at process cycle five, the processor executes a portion of code statement 112, wherein the value at the location within the array of data to be ciphered (plain[ ]) at an offset of l is read from the memory (as shown in column 220).
At process cycle six, (as part of the swapping of the data in the S-box) the processor executes another portion of code statement 110, wherein the location within the S-box having an offset of j is set to the value of a (as shown in column 224). Also within process cycle six, the processor retrieves the value at the location within the S-box having an offset of t (as shown in column 220). Additionally within process cycle six, the second read cycle for the retrieval of the value at the location within the array of data to be ciphered (plain[ ]) at an offset of l is read from the memory (as shown in column 222).
Additionally, an overlap in the generation of two different portions of ciphertext is occurring, as the second iteration of the while loop of FIG. 1 (denoted by parenthesized value in column 204) commences during process cycle seven of the first iteration of that while loop. At process cycle seven (process cycle zero of the subsequent iteration of the while loop of FIG. 1), the temp variable is set to the value at the location within the array of data to be ciphered (plain[ ]) at an offset of l (as shown in column 218), wherein the temp value is set to the value read from memory (MR2)). Also at process cycle seven (process cycle zero of the subsequent iteration of the while loop of FIG. 1) is where the retrieval of the value at the location within the S-box having an offset of t is completed (shown in column 222). Additionally, the operation (a=S[i]) (code statement 104) of the next iteration of the loop in FIG. 1 is performed.
At process cycle eight (process cycle one of the subsequent iteration of the while loop of FIG. 1), the processor generates the byte of ciphertext. In particular, the temp variable is set to the current temp value (plain[l]) XORed with the value at the location within the S-box having an offset of t (S[t]) (as shown in column 218 where temp is XORed with the value read from memory (MR2) (which is S[t])). Also during process cycle eight (process cycle one of the subsequent iteration of the while loop of FIG. 1), the operation (i++) (code statement 106) of the next iteration of the loop of FIG. 1 is performed.
At process cycle nine (process cycle two of the subsequent iteration of the while loop of FIG. 1), the remaining portion of code statement 112 is executed, wherein the l variable is incremented (as shown in column 216). Moreover, at process cycle nine (process cycle two of the subsequent iteration of the while loop of FIG. 1), the byte of ciphertext that was generated is stored in a data array (cipher[ ]). In particular, the value of temp is stored into the array of data that is ciphered at an offset of l (as shown in column 224). Finally, during process cycle nine (process cycle two of the subsequent iteration of the while loop of FIG. 1), the operation (j=j+a) (the remainder of code statement 106 associated with the next iteration of the loop of FIG. 1) is performed. Thus, certain non-speculative operations that contribute to the generation of the next portion of cipher text are overlapped with the operations to generate the previous portion of cipher text.
The loop of FIG. 1 can be implemented in FIG. 2 in of a number of different ways. For example, the loop can be implemented in FIG. 2 to cause the repetition of process cycles 3, 4, 5, 6, 7(0), 8(1) and 9(2) until the plaintext has been ciphered into ciphertext. As another example, the remaining process cycles of the next ciphertext could be continued to process cycle 6 of that ciphertext and the loop could cause the repetition of 7(0), 8(1), 9(2), 3, 4, 5 and 6.
In a similar manner to the generation of ciphertext through encryption of plaintext, RC4 can also be used to generate plaintext through the decryption of ciphertext. Thus, RC4, whether encrypting or decrypting, translates input text blocks into output text blocks.
Disadvantageously, as illustrated, the bottleneck of this prior art approach to an RC4 operation is associated with the access of data from the S-box stored in memory. The overlapping of the generation of the two different portions of ciphertext are non-speculative in nature. Specifically, in order to avoid the generation of inaccurate data for the ciphertext, the write operation to the S-box for the generation of a first portion of ciphertext is complete prior to the load operation to the S-box for the generation of a second portion of the ciphertext. As shown, three cycles are needed for each read operation in order to ensure that the data retrieved from the S-box is up-to-date. Accordingly, one byte of the plaintext is encrypted into ciphertext per seven process cycles.