The present invention relates in general to encryption processing systems and, more specifically, to an apparatus and method for encrypting and decrypting data using a multiple port memory and parallel read/write operations to two or more ports of the memory during encryption processing.
The RC4 algorithm developed by RSA Data Security, Inc., is one of the most popular encryption algorithms in the Internet web browser market. The ARCFOUR algorithm is another encryption algorithm that was developed to be fully compatible with the RC4 algorithm and is potentially useful with several security protocols, including, for example IPSec and TLS. The ARCFOUR algorithm can be used with a key having a variety of key lengths, and is often implemented with a 40-bit or 128-bit key. Prior to using the algorithm, a state array is initialized using the key.
The algorithm itself is a stream cipher and operates to encrypt or decrypt one byte of data at a time. After the state array is initialized, the input text is processed one byte at a time by an XOR logical operation (sometimes referred to herein as “XORed” or “XORing”) of a so-called pseudorandom byte K, which is generated by an algorithm using the state array, with the byte of input text. The result of this XOR operation is one output data byte, which may be in either encrypted or decrypted form depending on the initial state of the input byte.
More specifically, the ARCFOUR algorithm requires storage of a 256-byte state array and also temporary storage of a key in, for example, a 256-byte key array. The length of the key must be an integer multiple of bytes with a maximum length of 256 bytes.
After a new key is loaded into the key array, the state array is initialized. First, the state array is written with values 0 to 255. Then, each location in the state array is modified by the following algorithm, with x and y each initially starting at 0:Sx=state[x]Kx=key[(x mod key—length)]y=(y+Sx+Kx)mod256Sy=state[y]state[y]=Sxstate[x]=Syx=(x+1)mod256
The ARCFOUR algorithm for cipher processing a single byte is shown in the following equations. For processing each input byte, three reads from the state array and two writes to the state array are performed.x=(x+1)mod256Sx=state[x]y=(y+Sx)mod256Sy=state[y]state[y]=Sxstate[x]=Syt=(Sx+Sy)mod256K=state[t]output byte=(input byte)XOR K
The standard ARCFOUR algorithm, when implemented in a hardware processor, requires that three read and two write operations from a local memory, such as, for example, a random access memory (RAM) that is storing the state array, be done for each iteration of the algorithm. In prior hardware implementations, typically six processor clock cycles have been required to perform the required read, write, and XOR operations necessary to generate each output byte. However, it would be desirable to implement the algorithm in fewer clock cycles so that the throughput of an encryption processing system could be increased.
In addition, in prior hardware implementations, the writing of the key to and the initialization of the state array in the local memory has required a large number of clock cycles to perform. For example, prior processing systems typically require about 256 clock cycles to initialize the 256-byte state array required by the ARCFOUR algorithm. It would be desirable to write the key and initialize the state array in fewer clock cycles so that processor throughput could be increased.
Moreover, when a processor is used to handle different packets, the state of the array is often saved to external memory and restored again to its prior state to process later packets using the same state array (such as may be required for a single security session using the ARCFOUR algorithm). It would be desirable to be able to restore the previous state of the state array to the local memory using fewer clock cycles so that the throughput of the processor could be further increased.
Thus, there is a need for an improved encryption processing system that implements the ARCFOUR algorithm, is able to write a key and initialize a state array, and is able to restore a previous state of the state array, all in fewer clock cycles.