A. Field of the Invention
The present invention relates to a method and apparatus for compiling loops of instructions of computer programs and for executing the compiled loops in a data processing system.
B. Discussion of the Related Art
A compiler of a data processing system (e.g., microcomputers) typically converts source code of computer programs to object code. Source code is easily understandable by computer programmers and is generally written in a high level computer programming language, e.g., C, PASCAL, or BASIC. Object code, on the other hand, is represented by strings of binary numbers, e.g., 0101 or 1010, and is capable of being understood by the data processing system which executes the computer programs. There is also a computer programming language that is commonly referred to as assembly language, which is considered a low level programming language. Assembly language is also considered the closest programming language to object code and is understandable by computer programmers.
One line of source code of a computer program (and the corresponding line of object code) instructs the data processing system to perform one or more operation(s) or function(s), e.g., read data from memory and store the read data in variable VAR. An ordered set of one or more lines of source code (or object code) may be referred to as a loop of instructions. Loops are used to execute repeatedly a set of one or more instructions by a data processing system until a particular condition(s) is/are true. Typically, the condition appears in the line of source code which signifies the beginning or the end of the loop.
The instructions included in a loop are executed n-times (or n iterations) until a predetermined condition is satisfied. For example, in an iterative loop, a loop of instructions will be executed a specified number of times, e.g. 100 times. During the n-times iteration of an iterative loop, e.g., the second iteration, if the condition is not satisfied (e.g., the loop has not been executed 100 times), then the next iteration, or the (n+1)-times iteration of the loop, e.g., the third iteration, is executed.
Other loops, e.g., do-while loops, execute the set of instructions until a condition (e.g., X is not equal to 1) is satisfied. In this type of loop, the data processing system also checks to determine whether the condition to escape (or end) the execution of the loop is satisfied after each iteration.
Recently, vector methods have been applied to attempt to execute loops quickly and efficiently using data processing systems that contain multiple microprocessors. Using one such vector method, one microprocessor executes the n-times iteration of a loop and, before completing the n-times iteration, a second microprocessor begins executing the next iteration, or (n+1)-times iteration, of the loop. In other words, instructions from different iterations (n-times and (n+1)-times) of a loop can be executed at the same time using data processing systems including a plurality of microprocessors.
Using this vector method, however, one or more instructions of an iteration of a loop may not be executed according to the order for the complete loop execution set up by the computer programmer. Accordingly, if the intended order of execution is disturbed, and the loop contains multiple instructions which refer to and alter the same variable (or register or the same memory address), the result of executing the loop will differ from the originally intended result.
For example, one of ordinary skill will recognize that FIG. 1 illustrates source code for an iterative loop of a computer program written in the FORTRAN programming language. This is exemplified by the first line of source code, the instruction for i:=1 to 100 do (1).
According to this instruction, the data processing system will execute the successive instruction(s) a total of 100 times. When the loop of FIG. 1 begins executing in a data processing system, "i" is set to 1 (or i=1). Thereafter, during each successive iteration of the loop of FIG. 1, i is set equal to the previous value for i plus 1 (or i=i+1). For example, during the first iteration of the loop i=1 and during the second iteration i=2.
In FIG. 1, the only successive instruction is EQU A[p+i]:=A[q+i]+B; (2).
In general, this instruction sets the value of one entry (or element) in an array to the value of another entry in the array plus the value of the variable B. In this instruction, "A" represents the name of an array. An array is a collection of data that is given one name, in this case "A". Each element can be identified by a number, commonly referred to as a subscript, which indicates the row, and in some cases column, in which the element is located. In the example of FIG. 1, the first element in the array A is referred to by "A[0]" and the second "A[1]" and so forth.
Also included in the instruction (2) of FIG. 1 are the variables "i" which is incremented during execution of the loop, "p","q", and "B". Variables "p" and "q" and "B" may be initialized or set to be equal to specific numbers prior to the execution of the loop of FIG. 1. Changing the value of i with each iteration of the loop, as discussed above, necessarily affects the results of the execution of each iteration because the value of i impacts which element of the array A will be set equal to which other element of array A plus the value of B.
For example, if all entries in array A are set to 10, that is A[0]=10, A[1]=10 and so forth, and if p is initialized to -1 (or p=-1), q is initialized to 0 (or q=0), and B is initialized to 10 (or B=10), then after the first iteration A[0]=20 (or A[0]:=A[1]+B which is the same as A[0]:=10+10). After the second iteration A[1] will also equal 20 (or A[1]:=A[2]+B which is the same as A[1]:=10 +10). In each successive iteration of the loop, the next element of array A will be set equal to 20.
FIG. 2 illustrates an example of the parallel execution of multiple iterations of the loop of instructions illustrated in FIG. 1. In FIG. 2, however, instruction (2) of FIG. 1 is separated into 3 steps, for example: load A[q+1], add B, and store A[p+1]. These three steps assume the existence of a single register, for example, R1, in which to manipulate (load and add) the values of elements in the array A, and the steps are similar to the assembly language level instructions corresponding to instruction (2) of FIG. 1. Therefore, the first step "load A[q+1]" is equivalent to "load the contents of element A[q+1] into the register R1," the second step "add B" is the same as "add the value of variable B to the contents of register R1," and the third step "store A[p+1]" is the same as "store the contents of register R1 in element A[p+1] of array A."
In FIG. 2 the set of instructions for each iteration of the loop, e.g., iteration (i-times), iteration ((i+1)-times), and iteration (((i+1)+1)-times), is separated into three columns to represent the steps of each iteration being executed by one of multiple microprocessors in a data processing system. The left column of FIG. 2 represents time periods. In other words, during time period (2), for example, microprocessor 1, which executes the (i-times) iteration (where i=1), performs the add step (or adds the value of B to A[q+1]) simultaneously with microprocessor 2, which executes the (i+1)-times iteration, performing the load A[q+2] operation.
In FIG. 2, before the "store" instruction of (i-times) iteration of the loop is executed in microprocessor 1, during time (3), the "load" instruction of time (2) of the (i+1)-times iteration of the loop is executed in microprocessor 2. If the variable "p" is equal to the value of the variable q plus 1 (or p=q+1), the "load" step of the (i+1)-times iteration of the loop must be executed, for example, in time (4), after the "store" step of the (i-times) iteration because the "load" step of the (i+1)-times iteration depends upon the result of the "store" step of the (i-times) iteration. Otherwise, the intended result of the loop of instructions would be altered by this vector method. For example, if p=q+1, and both p and q are initialized to 1 (or p=1 and q=1), then in time interval (3) of the (i-times) iteration of the loop, the "store" step will store the contents of register R1 in A[3] and while, in time interval (2) of the (i+1)-times iteration of the loop, step (2) requests that the contents of A[3] be loaded into the register R1 for use during the (i+1)-times iteration of the loop. Therefore, if p=q+1, the loop consisting of the three steps of FIG. 2 cannot be parallely executed by multiple microprocessors.
To solve this problem, those skilled in the art have attempted to determine, before executing the program, whether multiple instructions of a loop refer to and alter the same variable (or register or address in memory) before attempting parallel execution. The compiler of the data processing system has generally been used to make this determination while compiling the source code into object code. Using conventional methods, the compiler decides which steps of a loop can be parallely executed by attempting to identify steps of the loop which refer to the same register or variable If the execution order of the instructions of a loop is to be changed to accommodate parallel processing, the compiler of the data processing system generally generates object cod instructions indicating which steps can be parallely executed. If the execution order of the instructions of the loop cannot be changed, the compiler generates loop instructions indicating which steps cannot be parallely executed.
Using this method, however, it is not possible to always determine which resource (register or variable) is referred to or changed by each instruction of a loop during the compiling procedure. Often, identifying which resources are referenced by an instruction of a loop can only be determined during the execution of the loop. For example, in the loop of instructions shown in FIG. 2, the values of variables p and q are determined by source code instructions which precede the loop and these source code instructions are executed before the execution of the loop. Therefore, the value of each of the variables p and q is not determined during the compiling of the source code of the loop, but is determined during the execution of the program. In this example, using the conventional method discussed above, the compiler will generate a loop of instructions which are not parallely executed to assure correct execution order. Therefore, even if instructions of the loop of instructions may be parallely executed, the loop of instructions are executed sequentially (non-parallel execution). As a result, execution time increases in these situations and the resources of the multiple microprocessors is also wasted.