1. Field of the Invention
The present invention relates in general to the field of computer systems, and in particular, to an apparatus and method for performing multi-dimensional computations based on an intra-add operation.
2. Description of the Related Art
To improve the efficiency of multimedia applications, as well as other applications with similar characteristics, a Single Instruction, Multiple Data (SIMD) architecture has been implemented in computer systems to enable one instruction to operate on several operands simultaneously, rather than on a single operand. In particular, SIMD architectures take advantage of packing many data elements within one register or memory location. With parallel hardware execution, multiple operations can be performed on separate data elements with one instruction, resulting in significant performance improvement.
Currently, the SIMD addition operation only performs xe2x80x9cverticalxe2x80x9dor inter-register addition, where pairs of data elements, for example, a first element Xn (where n is an integer) from one operand, and a second element Yn from a second operand, are added together. An example of such a vertical addition operation is shown in FIG. 1, where the instruction is performed on the sets of data elements (X3, X2, X1 and X0) and (Y3, Y2, Y1, and Y0) accessed as Source1 and Source2, respectively to obtain the result (X3+Y3, X2+Y2, X1+Y1, and X0+Y0).
Although many applications currently in use can take advantage of such a vertical add operation, there are a number of important applications which would require the rearrangement of the data elements before the vertical add operation can be implemented so as to provide realization of the application.
For example, a matrix multiplication operation is shown below.             MATRIX      ⁢              xe2x80x83            ⁢      A      *      VECTOR      ⁢              xe2x80x83            ⁢      χ        =          VECTOR      ⁢              xe2x80x83            ⁢      Υ                          "LeftBracketingBar"                                                            A                14                                                                    A                13                                                                    A                12                                                                    A                11                                                                                        A                24                                                                    A                23                                                                    A                22                                                                    A                21                                                                                        A                34                                                                    A                33                                                                    A                32                                                                    A                31                                                                                        A                44                                                                    A                43                                                                    A                42                                                                    A                41                                                    "RightBracketingBar"            xc3x97              "LeftBracketingBar"                                                            χ                4                                                                                        χ                3                                                                                        χ                2                                                                                        χ                1                                                    "RightBracketingBar"              =          "LeftBracketingBar"                                                                                    A                  14                                ⁢                                  χ                  4                                            +                                                A                  13                                ⁢                                  χ                  3                                            +                                                A                  12                                ⁢                                  χ                  2                                            +                                                A                  11                                ⁢                                  χ                  1                                                                                                                                          A                  24                                ⁢                                  χ                  4                                            +                                                A                  23                                ⁢                                  χ                  3                                            +                                                A                  22                                ⁢                                  χ                  2                                            +                                                A                  21                                ⁢                                  χ                  1                                                                                                                                          A                  34                                ⁢                                  χ                  4                                            +                                                A                  33                                ⁢                                  χ                  3                                            +                                                A                  32                                ⁢                                  χ                  2                                            +                                                A                  31                                ⁢                                  χ                  1                                                                                                                                          A                  44                                ⁢                                  χ                  4                                            +                                                A                  43                                ⁢                                  χ                  3                                            +                                                A                  42                                ⁢                                  χ                  2                                            +                                                A                  41                                ⁢                                  χ                  1                                                                        "RightBracketingBar"      
To obtain the product of the matrix A with a vector X to obtain the resulting vector Y, instructions are used to: 1) store the columns of the matrix A as packed operands (this typically requires rearrangement of data because the rows of the matrix A coefficients are stored to be accessed as packed data operands, not the columns); 2) store a set of operands that each have a different one of the vector X coefficients in every data element; 3) use vertical multiplication where each data element in the vector X (i.e., X4, X3, X2, X1) has to be first multiplied with data elements in each column (for example, [A14, A24, A34, A44]) of the matrix A. The results of the multiplication operations are then added together through three vertical add operations such as that shown in FIG. 1, to obtain the final result. Such a matrix multiplication operation based on the use of vertical add operations typically requires 20 instructions to implement, an example of which is shown below in Table 1.
Assumptions:
1/X stored with X1 first, X4 last
2/transposed of A stored with A11 first, A21 second, A31 third, etc.
3/availability of:
DUPLS: duplicate once
DUPLD: duplicate twice
Accordingly, there is a need in the technology for providing an apparatus and method which efficiently performs multi-dimensional computations based on a xe2x80x9chorizontalxe2x80x9dor intra-add operation. There is also a need in the technology for a method and operation for increasing code density by eliminating the need for the rearrangement of data elements and the corresponding rearrangement operations.
A method and apparatus for including in a processor instructions for performing intra-add operations on packed data is described. In one embodiment, an execution unit is coupled to a storage area. The storage area has stored therein a first and a second packed data operands. The execution unit performs operations on data elements in the first and the second packed data operands to generate a plurality of data elements in a packed data result in response to receiving a single instruction. At least two of the plurality of data elements in the packed data result store the result of an intra-add operation upon the first and the second packed data operands.