Media applications have been driving microprocessor development for more than a decade. In fact, most computing upgrades in recent years have been driven by media applications. These upgrades have predominantly occurred within consumer segments, although significant advances have also been seen in enterprise segments for entertainment enhanced education and communication purposes. Nevertheless, future media applications will require even higher computational requirements. As a result, tomorrow's personal computing (PC) experience will be even richer in audio-visual effects, as well as being easier to use, and more importantly, computing will merge with communications.
Accordingly, the display of images, as well as playback of audio and video data, which is collectively referred to herein as content, have become increasingly popular applications for current computing devices. Transform coding is a popular technique for compression and decompression of audio, images and video. Discrete transforms such as the discrete cosine transform (DCT) used in prior compression techniques have made use of floating-point or fixed-point number representations to approximate real irrational coefficients. However imperfections in these representations may contribute to an inverse transform mismatch when performed in the integer domain.
More recently integer transforms have been proposed, which have integer basis components and permit coefficients to be accurately represented by integers. By choosing coefficients which are integer approximations of DCT coefficients, the near optimum decorrelation properties of DCTs are preserved. More over, small integer coefficients may be selected to permit transforms to be implemented with shifts, additions and subtractions rather than multiplications, and some adverse effects of rounding may be avoided.
In some computer systems, processors are implemented to operate on values represented by a large number of bits (e.g., 32 or 64) using instructions that produce one result. For example, the execution of an add instruction will add together a first 64-bit value and a second 64-bit value and store the result as a third 64-bit value. However, media applications require the manipulation of large amounts of data which may be represented in a small number of bits. For example, image data typically requires 8 or 16 bits and sound data typically requires 8 or 16 bits. To improve efficiency of media applications, some prior art processors provide packed data formats. A packed data format is one in which the bits typically used to represent a single value are broken into a number of fixed sized data elements, each of which represents a separate value. For example, a 64-bit register may be broken into two 32-bit elements, each of which represents a separate 32-bit value. In addition, these prior art processors provide instructions for separately manipulating each element in these packed data types in parallel. For example, a packed add instruction adds together corresponding data elements from a first packed data and a second packed data. Thus, if a multimedia algorithm requires a loop containing five operations that must be performed on a large number of data elements, it is desirable to pack the data and perform these operations in parallel using packed data instructions. In this manner, these processors can more efficiently process content of media applications.
Unfortunately, current methods and instructions target the general needs of transforms and are not comprehensive. In fact, many architectures do not support a means for efficient integer transform calculations over a range of coefficient sizes and data types. In addition, data ordering within data storage devices such as SIMD registers, as well as a capability of adding adjacent values in a register, are generally not supported. As a result, current architectures require unnecessary data type changes which minimizes the number of operations per instruction and significantly increases the number of clock cycles required to order data for arithmetic operations.
Therefore, there remains a need to overcome one or more of the limitations existing in the techniques above-described.