Media applications have been driving microprocessor development for more than a decade. In fact, most computing upgrades in recent years have been driven by media applications, predominantly within the computer segments, but also in enterprise segments for entertainment-enhanced education and communication purposes. Nevertheless, future media applications will require even higher computational requirements. As a result, tomorrow's personal computer (PC) will be even richer in audio-visual effects, as well as being easier to use and more importantly, computing will merge with communications.
Accordingly, the display of images as well as playback of audio and video have become increasingly popular applications for current computing devices. However, protection of such content becomes increasingly important as such current computing devices emerge. For example, encryption algorithms are commonly used to protect the integrity of transmitted content while error control algorithms are utilized to recapture content in the event of lost or corrupted data during transmission. Unfortunately, a very significant number of multi-media, as well as communications and encryption algorithms, utilize look-up tables.
As known to those skilled in the art, look-up tables store results of computationally-intensive operations that are calculated before an application starts or when an application is initialized. In addition, some applications access data within the look-up tables in a random pattern. Consequently, it is often difficult to exploit any data level parallelism utilizing, for example, single instruction multiple data (SIMD) instructions. This is due to the fact that current instructions have no efficient way for loading a register, in response to execution of a single instruction, with data that is stored within randomly located addresses.
Referring now to FIG. 1, FIG. 1 illustrates this data load problem. As depicted in FIG. 1, values in a data storage area (T) 102 cannot be loaded into a data storage device 120, such as a register, utilizing a single instruction if the indices of A, B, C and D are offsets for data that is not contiguous in memory. Current algorithms that utilize look-up tables must generate an address before accessing the data storage area 102 in order to store T[A] 104, T[B] 110, T[C] 106 and T[D] 108. For example, in a 32-bit approach, one of the 4 bytes in a 32-byte word is used to access data from a table with 256 values. Some implementations generate a look-up table address by accessing the targeted byte from memory, while other implementations extract the targeted byte from the 32 bit word.
Encryption and channel codes are examples of common operations that use utilize look-up tables. The Rijndael algorithm which was recently selected as the standard by the Advanced Encryption Standard (AES), while a very popular channel code is the Reed Solomon, forward error correction code. As known to those skilled in the art, forward error control adds data to a network packet to correct transmission errors in lost packets, which is vital within wireless networks. Rijndael encryption and Reed Solomon forward error control employ methods that access data from look-up tables in a random address pattern. Both of these algorithms have data level parallelism that can be more effectively exploited with an efficient method of loading a data storage device from random locations in memory.
In fact, in both Rijndael encryption and forward error control (FEC), implemented with Reed Solomon erasure codes, finite field multiply-accumulate is one of the most common operations and occupies the majority of the computational capacity. As a result, significant improvement is available in architecture designs by improving the performance of such operations. In fact, memory access will be a performance bottleneck in future processors. As a result, the inability to load randomly distributed data leads to a significant increase in the number of clock cycles required to execute communication as well as encryption algorithms. In addition, power required to execute such algorithms is approximately proportional to the number of clocks. As such, power loss presents a significant problem in conventional systems.
Therefore, there remains a need to overcome one or more of the limitations in the above-described, existing art.