1. Field of the Invention
The present invention relates in general to the field of computer systems, and in particular, to an apparatus and method for performing multi-dimensional computations based on an intra-add operation.
2. Description of the Related Art
To improve the efficiency of multimedia applications, as well as other applications with similar characteristics, a Single Instruction, Multiple Data (SIMD) architecture has been implemented in computer systems to enable one instruction to operate on several operands simultaneously, rather than on a single operand. In particular, SIMD architectures take advantage of packing many data elements within one register or memory location. With parallel hardware execution, multiple operations can be performed with one instruction, resulting in significant performance improvement.
Currently, the SIMD addition operation only performs "vertical" or inter-register addition, where pairs of data elements, for example, a first element Xn (where n is an integer) from one operand, and a second element Yn from a second operand, are added together. An example of such a vertical addition operation is shown in Table 1, where the instruction is performed on the sets of data elements (a.sub.1 and a.sub.2) and (b.sub.1 and b.sub.2) accessed as Source1 and Source2, respectively.
TABLE 1 ##STR1## ##STR2## ##STR3##
Although many applications currently in use can take advantage of such a vertical add operation, there are a number of important applications that require the rearrangement of the data elements before the vertical add operation can be implemented so as to provide realization of the application.
For example, a matrix multiplication operation is shown below: ##EQU1##
To obtain the product of a matrix A with a vector X to obtain the resulting vector Y, instructions are used to: 1) store the columns of the matrix A as packed operands (this typically requires rearrangement of data because the rows of the matrix A coefficients are stored to be accessed as packed data operands, not as columns); 2) store a set of packed operands that each have a different one of the vector X coefficients in every data element; 3) use vertical multiplication as shown in Tables 2A-2D; and 3) use vertical adds as shown in Tables 2E-2G.
TABLE 2A ##STR4## ##STR5## ##STR6##
TABLE 2B ##STR7## ##STR8## ##STR9##
TABLE 2C ##STR10## ##STR11## ##STR12##
TABLE 2D ##STR13## ##STR14## ##STR15##
TABLE 2E ##STR16## ##STR17## ##STR18##
TABLE 2F ##STR19## ##STR20## ##STR21##
TABLE 2G ##STR22## ##STR23## ##STR24##
Accordingly, there is a need in the technology for a method and operation for increasing code density by eliminating the need for the rearrangement of data elements and the corresponding rearrangement operations.