1. Field of the Invention
The invention relates to compiler for converting a source program into an object program and a method for compiling, and more particularly to a compiler for optimizing to generate an instruction sequence for general processing of each component (array element) of a plurality of array data and a method for compiling.
2. Description of the Related Art
Means for performing multimedia processing at a high speed include a computer system, which has an instruction sequence called a SIMD (Single Instruction Multiple Data Stream) instruction and a multimedia extension instruction (hereinafter simply called "SIMD instruction sequence"). The SIMD instruction sequence is a group of instructions to generally perform computing of set data (SIMD type data), which is called an SIMD type.
FIG. 17 shows a structure of SIMD data. The SIMD data has a size equal to or integral multiple of a word size. The word means a data size to be a basic unit of processing when a processor of a computer transfers data between a primary storage and a register or performs computing between the register and the primary storage. FIG. 17 shows that one SIMD data 170 comprises four SIMD element data 171, 172, 173 and 174. Generally, the SIMD data comprises two, four or eight SIMD element data. Each SIMD element data has the same size of 8 bits, 16 bits or 32 bits in general. Quantity of SIMD element data contained in one SIMD data is called an SIMD parallelism.
The SIMD instruction performs the same computing such as addition, subtraction and multiplication of each SIMD element data in the SIMD data. FIG. 18 shows, for example, operation of the SIMD instruction to perform the addition processing of the SIMD data in that each SIMD element data has a size of 16 bits and the SIMD parallelism is "4". In the drawing, a register 1801 and a register 1802 have an operand of the SIMD data. Four independent adders (ADD) 1811, 1812, 1813 and 1814 in an SIMD adder 1810 add respective SIMD element data of the operand and store the result into a register 1803 where the computed results are stored.
The SIMD instruction sequence is effective to perform the same operations on individual elements of array data on the main storage like a C program 1901 as shown in FIG. 19.
Processing according to the SIMD instruction will be described with reference to FIG. 20. A partial array 2001 of an array A placed on the main storage is read in a unit of word into the register 1801 of the computer by an ordinary load instruction. In the same manner, a partial array 2002 of an array B is also read in a unit of word into the register 1802. According to the SIMD adding instruction, the contents of the registers 1801, 1802 are added by the SIMD adder 1810, and the added result is stored into the register 1803. Then, the contents of the register 1803 are written in a unit of word onto a partial array 2003 of an array C placed on the main storage according to an ordinary store instruction.
Thus, by using the SIMD instruction sequence, the same processing can be made on a plurality of array elements by processing of one word, so that the program can be run more quickly. Here, conversion of a program having computing of respective array elements described sequentially into a program to compute the respective array elements by the SIMD instruction sequence is defined as SIMD conversion.
Processing similar to the SIMD computing is also used for a vector type supercomputer. In the supercomputer as shown in FIG. 21, partial arrays 2101, 2102 placed on the main storage are loaded into vector registers 2111, 2112 in a vector computer 2110. A vector-arithmetic unit 2120 performs computing between the vector register 2111 and the vector register 2112 and stores the computed results into a vector register 2113. Then, the contents of the vector register 2113 are written back into an array area 2103 on the main storage. For the vector supercomputer, an automatic vectorizing compiler is practically used, which finds parts executable by the vector instruction from a sequentially described program and converts the detected part into a program using the vector instruction. The automatic vectorizing compiler analyzes an array computing order in the source program to generate an object program using the vector instruction without changing the meaning that the program has.
But, the above-described automatic vectorizing compile technology cannot be adopted as it is to a compiler for performing SIMD conversion, because of a major reason that the SIMD instruction sequence has its free access to the array elements limited.
In vector processing, each array element has a word size, and transfer between the partial array 2101 and the vector register 2111 on the main storage is conducted in integral multiple of the word size. Any partial array starting from whichever element of the array arranged on the main storage can make transfer with the vector register. On the other hand, to run by the SIMD instruction sequence, a plurality of array elements is packed into one SIMD data. Data transfer between the main storage and the register is conducted in a size of SIMD data, and the main storage data area for transfer must be an area starting from a special address called a word boundary. In other words, it is not easy to transfer a partial array starting from a given element in the array between the main storage and the register. Therefore, a conventional program for a computer having the SIMD instruction sequence was described manually by the assembler with address conditions of accessing array elements taken into consideration.
To effectively develop a program for a computer having the SIMD instruction sequence, a compiler (SIMD conversion compiler) which can generate an object program efficiently using the SIMD instruction sequence is required. To build such an SIMD conversion compiler, the following two problems in connection with access to data on the main storage must be solved.
A first problem is that data placed on an area over the word boundary on the main storage cannot be computed by the SIMD instruction sequence alone.
A second problem is that when access is made to the main storage data containing data over a word boundary, the number of access to the main storage data increases.
Such problems will be described below in detail. It is to be understood that access (not over the word boundary) to data arranged along the word boundary is called aligned access, and access (over the word boundary) not arranged along the word boundary as a non-aligned access.
The first problem will be described first. This problem causes a problem that when a certain loop structure in a program has a non-aligned access part, the pertinent part can not be subjected to SIMD conversion, and the generated object program has a degraded executing performance.
Differences between the aligned access and the non-aligned access will be described with reference to FIG. 22. In the drawing, one word is 32 bits (4 bytes), transfer between the main storage and the register is made in a unit of two words (64 bits), the word boundary is at a position that a byte address on the main storage is an integral multiple of 4, the array element has a size of 16 bits (2 bytes), and the address is described in hexadecimal number. When four array elements 220 through 223 starting from address 00 on the main storage are loaded into the register, data areas (those starting from address 00 to just before address 08) of these four elements are arranged along the word boundary. Therefore, this aligned access can be made by an ordinary load instruction. Meanwhile, to load four array elements 224 through 227 starting from address 0E into the register, data areas (those starting from address 0E to just before address 16) of these four elements are not arranged along the word boundary, and non-aligned access is required.
Many of computers do not have the instruction sequence for the non-aligned access. For example, processing on the four elements starting from the address 0E cannot be subjected to SIMD conversion as it is. Therefore, an applicable range for automatic SIMD conversion by the compiler is narrow, and the running performance of a generated object code is disturbed from being improved. This type of non-aligned access is required to be replaced with a set of aligned access and its accompanying several supplemental operations.
The second problem will be described. The second problem degrades the performance when access to the same region on the main storage is repeated. The number of access to the individual array element can be decreased by conventional optimizing techniques. Specifically, a value is transferred from the main storage onto the register to compute on the register, and the computed result is written back to the main storage, thereby enabling to optimize the number of access to the same region on the main storage. However, even if a subject to be computed is not in complete agreement in the source program, the same area in the main storage might be accessed to perform SIMD conversion of a program.
Referring to the C program of FIG. 23, the above problem will be described with reference to FIG. 24. When program 2301 shown in FIG. 24 which contains is run the SIMD instruction sequence, the first execution of the loop body needs the 0th element to the 3rd element of the array A and the 1st element to the 4th element of the array A. The former can be done by simply loading into a register 2401 by the aligned access. But, since the latter becomes non-aligned access, the aligned access is made by dividing into SIMD data (data held by a register 2402) containing the 1st element to the 3rd element and SIMD data (data held by a register 2403) containing the 4th element. Among them, a word containing the 0th element to the 3rd element is already accessed, and overlapped access degrades the execution performance. In other words, access to data regions such as A[0] to A[3] and A[1] to A[4] which do not agree perfectly on the source program results in causing overlapped access if conversion is made for SIMD conversion.