1. Field of the Invention
The present invention relates to improvements in microprocessors for computer systems. More specifically, the present invention relates to a parallel read functional unit for microprocessors, and methods relating thereto.
2. Related Art
An important consideration in the design of today's modern computer systems is the need to protect data. Efforts in this regard focus both on hardware and software solutions. Symmetric-key cryptography is one solution that can be used to provide data confidentiality on public communication networks such as the Internet. It involves encrypting a plaintext message P using a symmetric-key algorithm (cipher) and a secret key K. The encrypted message (ciphertext) is then sent to the receiver, where it is decrypted using the same cipher and secret key. Symmetric-key ciphers usually have an iterated round structure, where a short sequence of operations (called a round) is repeated on the plaintext block to compute the ciphertext. The input of a round consists of the output of the previous round and one or more subkeys, which are derived from the secret key. Common round operations include table lookups, modular addition (subtraction), logical operations, shifts, rotates, multiplications, and bit permutations.
On a programmable processor that implements a reduced instruction-set computer (RISC)-like instruction set, table lookups generally consume the greatest fraction of the execution time. Table 1, below, lists some sample symmetric-key ciphers and their associated rounds and table lookup characteristics. For each cipher, shown in Table 1 is the block size, typical key size, and the number of rounds.
TABLE 1BlockSizeKey SizeNum.Num.TableNum.Cipher(bits)(bits)RoundsTablesStructureLookupsDES645616826 × 321283DES6411248826 × 32384RC48128 1*128 × 83 + 2WBlowfish6412816428 × 32 64AES-12812812810428 × 32160AES-19212819212428 × 32192AES-25612825614428 × 32224Twofish12812816428 × 32128MARS12812832228 × 32 80As used above, block size represents the amount of data that the cipher can encrypt at a time, and key size relates to the strength of the cipher against cryptanalytic attacks. Data Encryption Standard (DES) and its variant 3DES were the NIST standards for block encryption from 1976 to 2001. 3DES continues to be used extensively in many systems. RC4 is a popular stream cipher, which is originally used in the IEEE 802.11 wireless standard. Blowfish is used in many protocols and applications, for example GPG, SSH, SSLeay, and JAVA cryptography extensions. Advanced Encryption Standard (AES) is the current NIST standard for block encryption. Its key size can be 128, 192, or 256 bits. These are denoted above as AES-128, AES-192, and AES-256, respectively. Twofish and MARS are two of the five finalist ciphers in the AES selection program.
FIG. 1 illustrates how table lookups are typically used in existing symmetric-key ciphers. With reference to both FIG. 1 and Table 1 above, summarized are the number and structure of the lookup tables used by each cipher. The notation 2a×b is used to denote a table with 2a entries, where each entry is b-bits wide. In AES, the input to the ith round is a 128-bit block composed of four 32-bit words. The bytes in these words are labeled b0 to b15. There are four 28×32 tables, labeled TA-TD. The rightmost byte of each word is used as index into TA, the next byte is used as index into TB, and so on, until all tables are accessed four times. The table lookup results and four subkeys are then exclusive-or'ed (XORed) as shown. Of the remaining ciphers, Blowfish, MARS, and Twofish are similar to AES in that they use multiple 28×32 tables. DES and 3DES use eight 26×32 tables, while RC4 uses a single 28×8 table.
In the past, special instructions for accelerating table lookups in symmetric-key ciphers have been provided for microprocessors. The sbox instruction performs fast lookups of tables located in main memory by accelerating the effective address computations. The CryptoManiac processor uses a similar sbox instruction to read its four 1 kB on-chip caches. However, in both of these approaches, only a single table can be read with each sbox instruction. Other approaches, such as the PAX crypto-processor, provide on-chip lookup tables can be used to accelerate symmetric-key encryption. However, the number of tables and table widths are not scalable, and must utilize multiple sub-opcode fields to specify the number of lookups to be performed, data size, and the index bytes to be used. Still further, existing approaches contain complex logic circuits which result in increased circuit area and reduced speed.