The present invention relates to a processor of an SIMD (Single Instruction Multiple Data) type and, more particularly, relates to an SIMD processor that allows accesses to matrix data to be made with a high degree of flexibility.
When processing of two-dimensional pictures or matrix processing of three-dimensional graphics is to be carried out by using an SIMD processor, it is necessary to rearrange pieces of data in advance so as to make the pieces of data match the processing format of a processing instruction provided for the processor. For example, data arrangement instructions are included in an instruction set called SSE (Streaming SIMD Extensions) and an instruction set called AltiVec. The SSE instruction set has been developed by Intel Corporation, as an instruction set oriented for multimedia applications. On the other hand, the Altivec instruction has been developed by Motorola Inc., also for the same applications. In these SSE and Altivec instruction sets, a variety of data arrangement instructions is defined as respectively disclosed in documents:
“IA-32 Intel(R) Architecture Software Developer's Manual Volume 1: Basic Architecture,” Intel Corporation, 2004; and
“AltiVec Technology Programming Interface Manual,” Motorola Inc., June 1999.
Assume for example that a 4-row 4-column matrix A and a 4-row 4-column matrix B are subjected to inner-product processing and the result of the inner-product processing is put in a 4-row 4-column matrix D. In this case, computations carried out in the inner-product processing are each a computation to find a sum of products as follows:D [0] [0]=A [0] [0]×B [0] [0]+A [0] [1]×B [1] [0]+A [0] [2]×B [2] [0]+A [0] [3]×B [3] [0]D [0] [1]=A [0] [0]×B [0] [1]+A [0] [1]×B [1] [1]+A [0] [2]×B [2] [1]+A [0] [3]×B [3] [1]D [0] [2]=A [0] [0]×B [0] [2]+A [0] [1]×B [1] [2]+A [0] [2]×B [2] [2]+A [0] [3]×B [3] [2]D [0] [3]=A [0] [0]×B [0] [3]+A [0] [1]×B [1] [3]+A [0] [2]×B [2] [3]+A [0] [3]×B [3] [3]D [1] [0]=A [1] [0]×B [0] [0]+A [1] [1]×B [1] [0]+A [1] [2]×B [2] [0]+A [1] [3]×B [3] [0]D [1] [1]=A [1] [0]×B [0] [1]+A [1] [1]×B [1] [1]+A [1] [2]×B [2] [1]+A [1] [3]×B [3] [1]D [1] [2]=A [1] [0]×B [0] [2]+A [1] [1]×B [1] [2]+A [1] [2]×B [2] [2]+A [1] [3]×B [3] [2]D [1] [3]=A [1] [0]×B [0] [3]+A [1] [1]×B [1] [3]+A [1] [2]×B [2] [3]+A [1] [3]×B [3] [3]D [2] [0]=A [2] [0]×B [0] [0]+A [2] [1]×B [1] [0]+A [2] [2]×B [2] [0]+A [2] [3]×B [3] [0]D [2] [1]=A [2] [0]×B [0] [1]+A [2] [1]×B [1] [1]+A [2] [2]×B [2] [1]+A [2] [3]×B [3] [1]D [2] [2]=A [2] [0]×B [0] [2]+A [2] [1]×B [1] [2]+A [2] [2]×B [2] [2]+A [2] [3]×B [3] [2]D [2] [3]=A [2] [0]×B [0] [3]+A [2] [1]×B [1] [3]+A [2] [2]×B [2] [3]+A [2] [3]×B [3] [3]D [3] [0]=A [3] [0]×B [0] [0]+A [3] [1]×B [1] [0]+A [3] [2]×B [2] [0]+A [3] [3]×B [3] [0]D [3] [1]=A [3] [0]×B [0] [1]+A [3] [1]×B [1] [1]+A [3] [2]×B [2] [1]+A [3] [3]×B [3] [1]D [3] [2]=A [3] [0]×B [0] [2]+A [3] [1]×B [1] [2]+A [3] [2]×B [2] [2]+A [3] [3]×B [3] [2]D [3] [3]=A [3] [0]×B [0] [3]+A [3] [1]×B [1] [3]+A [3] [2]×B [2] [3]+A [3] [3]×B [3] [3]
In order to find every sum of products described above, data-arrangement and operation instructions are defined as follows. In the first place, a data arrangement instruction named MERGEH with a format of ‘MERGEH d, a, b’ is defined as an instruction for carrying out the following operations:R [d] [0]=R [b] [2]R [d] [1]=R [a] [2]R [d] [2]=R [b] [3]R [d] [3]=R [a] [3]
In the second place, a data arrangement instruction named MERGEL with a format of ‘MERGEL d, a, b’ is defined as an instruction for carrying out the following operations:R [d] [0]=R [b] [0]R [d] [1]=R [a] [0]R [d] [2]=R [b] [1]R [d] [3]=R [a] [1]
In the third place, an operation instruction named DOT with a format of ‘DOT d, a, b’ is defined as an instruction for carrying out the following operations:R [d] [0]=R [a] [0]×R [b] [0]+R [a] [1]×R [b] [1]+R [a] [2]×R [b] [2]+R [a] [3]×R [b] [3]
In the above instructions, symbol R is a register file having 16 rows and 4 columns. Then, a matrix A is defined as a matrix having the following values:R [0] [0], R [0] [1], R [0] [2], R [0] [3],R [1] [0], R [1] [1], R [1] [2], R [1] [3],R [2] [0], R [2] [1], R [2] [2], R [2] [3],R [3] [0], R [3] [1], R [3] [2], R [3] [3]
By the same token, a matrix B is defined as a matrix having the following values:R [4] [0], R [4] [1], R [4] [2], R [4] [3],R [5] [0], R [5] [1], R [5] [2], R [5] [3],R [6] [0], R [6] [1], R [6] [2], R [6] [3],R [7] [0], R [7] [1], R [7] [2], R [7] [3]
In the same way, a matrix D is defined as a matrix having the following values:R [8] [0], R [8] [1], R [8] [2], R [8] [3],R [9] [0], R [9] [1], R [9] [2], R [9] [3],R [10] [0], R [10] [1], R [10] [2], R [10] [3],R [11] [0], R [11] [1], R [11] [2], R [11] [3]
With the register file R as well as the matrices A, B and D so defined, a program for finding the inner product of the 4-row and 4-column matrices A and B is expressed by the following sequence of instructions:                MERGEH 12, 0, 2        MERGEH 13, 1, 3        MERGEL 14, 0, 2        MERGEL 15, 1, 3        MERGEH 0, 12, 13        MERGEL 1, 12, 13        MERGEH 2, 14, 15        MERGEL 3, 14, 15        DOT 12, 0, 4        DOT 13, 0, 5        DOT 14, 0, 6        DOT 15, 0, 7        MERGEL 12, 12, 13        MERGEL 14, 14, 15        MERGEL 8, 12, 14        DOT 12, 1, 4        DOT 13, 1, 5        DOT 14, 1, 6        DOT 15, 1, 7        MERGEL 12, 12, 13        MERGEL 14, 14, 15        MERGEL 9, 12, 14        DOT 12, 2, 4        DOT 13, 2, 5        DOT 14, 2, 6        DOT 15, 2, 7        MERGEL 12, 12, 13        MERGEL 14, 14, 15        MERGEL 10, 12, 14        DOT 12, 3, 4        DOT 13, 3, 5        DOT 14, 3, 6        DOT 15, 3, 7        MERGEL 12, 12, 13        MERGEL 14, 14, 15        MERGEL 11, 12, 14        
Thus, it is obvious from the above typical program that, in an application where data transfer instructions (or data arrangement instructions) are to be executed prior to executions of operation instructions in accordance with the related-art technology, for example, 36 instructions need to be executed in order to compute the inner product of 4-row 4-column matrices.