Security algorithms may be used to encode or decode data transmitted or received in a computer network through techniques, such as compression.
In some instances, the network processor may compress or decompress the data in order to help secure the integrity and/or privacy of the information being transmitted or received within the data. The data can be compressed or decompressed by performing a variety of different algorithms, such as hash algorithms.
One such hash algorithm is the secure hash algorithm 1 (“SHA-1”) security algorithm. The SHA-1 algorithm can be a laborious and resource-consuming task for many network processors, however, as it requires numerous mathematically intensive computations within a main recursive compression loop. Moreover, the main compression loop may be performed numerous times in order to compress or decompress a particular amount of data.
In general, hash algorithms are algorithms that take a large group of data and reduce it to a smaller representation of that data. Hash algorithms may be used in such applications as security algorithms to protect data from corruption or detection. The SHA-1 algorithm, for example, may reduce groups of 64 bytes of data to 20 bytes of data. Other hash algorithms, such as the SHA-128, SHA-129, and message digest 5 (MD5) algorithms may also be used to reduce large groups of data to smaller ones. Hash algorithms, in general, can be very taxing on computer system performance as the algorithm requires intensive mathematical computations in a recursive main compression loop that is performed iteratively to compress or decompress groups of data.
Adding to the difficulty in performing the hash algorithms at high frequencies are the latencies, or “bottlenecks,” that can occur between operations of the algorithm due to data dependencies between the operations. When performing the algorithm on typical processor architectures, the operations must be performed in substantially sequential fashion because typical processor architectures perform the operations of each iteration of the main compression loop on the same logic units or group of logic units. As a result, if dependencies exist between the iterations of the main loop, a bottleneck forms while unexecuted iterations are delayed to allow the hardware to finish processing the earlier operations.
These bottlenecks can be somewhat abrogated by taking advantage of instruction-level parallelism (“ILP”) of instructions within the algorithm and performing them in parallel execution units.
Typical prior art parallel execution unit architectures used to perform hash algorithms have had marginal success. This is true, in part, because the instruction and sub-instruction operations associated with typical hash algorithms rarely have the necessary ILP to allow true independent parallel execution. Furthermore, earlier architectures do not typically schedule operations in such a way as to minimize the critical path associated with long dependency chains among various operations.