1. Field of the Invention
The present invention relates generally to software compilers, and more particularly to software compilers for compiling vector instructions.
2. Related Art
A compiler is a computer program that translates a source computer program written in a source computer language to a target computer program written in a target computer language.
In translating the source program to the target program, a compiler may be required to satisfy various requirements. For example, a compiler may be required to perform the translation within a specified amount of time.
Also, a compiler may be required to perform the translation such that the target computer program is one which will run effectively and efficiently on a particular target computer hardware. For example, suppose the target computer hardware is a computer having multiple pipelined functional units. For such computer hardware, typical requirements imposed on compilers include the following. First, the compiler must minimize the effects of pipeline delay. Second, the compiler must maximize the use of the functional units. These two compiler requirements are described below by way of an example.
Suppose a source program has the instructions shown below in Code Example 1.
______________________________________ 1 A=B+C 2 D=E+F Code Example 1 ______________________________________
Conventionally, a compiler's code generator would receive a representation of the source program containing the instructions in Code Example 1 and produce assembly language code (the code generator receives a "representation" since other components of the compiler may have performed initial compilation steps on the source program). For the instructions shown in Code Example 1, the code generator might generate the assembly language pseudocode shown in Code Example 2.
______________________________________ 1 XXX 2 Load P.sub.B R1 3 Load P.sub.C R2 4 Add R1 R2 R3 5 STA R3 P.sub.A 6 LOAD P.sub.E R4 7 LOAD P.sub.F R5 8 ADD R4 R5 R6 9 STA R6 P.sub.D 10 YYY Code Example 2 ______________________________________
In Code Example 2, note that the XXX at line 1 represents assembly language pseudocode which the code generator generated for instructions which appeared before line 1 in Code Example 1. Similarly, the YYY in line 10 of Code Example 2 represents assembly language pseudocode which the code generator generated for instructions occurring after the instruction at line 2 in Code Example 1.
The instructions at lines 2-5 in Code Example 2 correspond to the instruction at line 1 in Code Example 1. Specifically, the instructions in lines 2 and 3 in Code Example 2 load values from memory locations B and C into Registers R1 and R2, respectively. The instruction at line 4 in Code Example 2 adds the values in Registers R1 and R2 and places the result in Register R3. The instruction at line 5 in Code Example 2 stores the value in R3 to a memory location A.
The instructions at lines 6-8 in Code Example 2 correspond to the instruction at line 2 in Code Example 1. The operation of the instructions at lines 6-8 is analogous to the operation of the instructions at lines 2-5 in Code Example 2.
The first compiler requirement, minimizing the effects of pipeline delay, shall now be described with reference to Code Example 2.
The ADD instruction at line 4 in Code Example 2 uses the contents of R1 and R2. Therefore, for proper operation, R1 and R2 must be stable prior to the execution of the ADD instruction at line 4. The LOAD instructions at lines 2 and 3 load R1 and R2, respectively. These LOAD instructions require a finite amount of time to access and to transfer data from memory to registers. Therefore, it is possible that R1 and R2 will not be stable at the execution of the ADD instruction at line 4. If this occurs, the ADD instruction at line 4 will not produce correct results.
The store instruction at line 5 uses the contents of R3. Therefore, for proper operation, R3 must be stable before the store instruction at line 5 is executed. A pipeline delay is associated with the ADD instruction at line 4. That is, a finite amount of time passes between when the values in R1 and R2 are submitted to an arithmetic pipeline and when the pipeline returns the sum of the values in R1 and R2. Since the ADD instruction requires a finite amount of time to process, the value in R3 may not be stable before the store instruction at line 5 is executed. If this is the case, the store instruction at line 5 will not produce correct results.
As the previous two paragraphs describe, the pseudo assembly code generated by the code generator may not operate correctly due to pipeline delays.
The second requirement, maximizing the use of functional units, shall now be described with reference to Code Example 2.
A conventional computer having multiple pipelined functional units may include a memory access functional unit and an arithmetic/logic unit (ALU). Load instructions (such as those at lines 2 and 3) might be performed by the memory access unit. Arithmetic functions (such as the ADD instruction at line 4) might be performed by the ALU. Since multiple functional units exist, multiple instructions in Code Example 2 may be executed at the same time. This results in the maximal use of all the functional units. Note, however, that the pseudo assembly language code generated by the code generator in Code Example 2 has instructions being executed sequentially. Therefore, during any given time, only one functional unit is working. All other functional units are idle. Therefore, the assembly language pseudocode generated by the code generator does not maximize the use of all the functional units.
Conventionally, a compiler's scheduler modifies the assembly language code generated by the code generator in order to satisfy the two requirements described above. Specifically, the scheduler modifies the assembly language code generated by the code generator in order to minimize the effects of pipeline delay and maximize the use of functional units.
With regard to minimizing the effects of pipeline delays, the scheduler might modify the assembly language code shown in Code Example 2 to the assembly language code shown in Code Example 3.
______________________________________ 1 LOAD P.sub.B R1 2 LOAD P.sub.C R2 3 LOAD P.sub.E R4 4 LOAD P.sub.F R5 5 XXX 6 ADD R1 R2 R3 7 ADD R4 R5 R6 8 YYY 9 STA R3 P.sub.A 10 STA R6 P.sub.D Code Example 3 ______________________________________
In Code Example 3, the scheduler has moved the LOAD statements above the instruction XXX. This insures that the values in registers R1, R2, R4 and R5 are stable before the execution of the ADD instructions at lines 6 and 7. Also, the code generator has moved the store instructions below the YYY instructions. This insures that the values in R3 and R6 are stable before they are used before the store instructions in lines 9 and 10.
Alternatively, with regard to maximizing the use of the functional units, the scheduler might modify the assembly language code shown in Code Example 2 to the assembly language code shown in Code Example 4.
______________________________________ 1 XXX 2 LOAD P.sub.B R1 3 LOAD P.sub.C R2 4 LOAD P.sub.E R4 5 ADD R1 R2 R3; LOAD P.sub.F R5 6 STA R3 P.sub.A ; ADD R4 R5 R6 7 STA R6 P.sub.D 8 YYY Code Example 4 ______________________________________
In line 5 of Code Example 4, the scheduler is causing an ADD instruction and a load instruction to execute at the same time. This is possible since the ADD instruction and the load instruction do not depend upon one another and because the ADD instruction and the load instruction are executed by different functional units (that is, the ADD instruction is performed by the ALU and the load instruction is performed by the memory access unit). Similarly, the scheduler at line 6 is causing a store instruction and an ADD instruction to execute at the same time.
As noted above, in conventional compilers, the scheduler is responsible for both minimizing the effects of pipeline delay and maximizing the use of functional units. Note, however, that the assembly language code in Code Example 3 minimizes the effects of pipeline delay but does not maximize the use of functional units. Also, the assembly language code in Code Example 4 maximizes the use of functional units but does not minimize the effects of pipeline delay. Therefore, in order to both minimize the effects of pipeline delay and maximize the use of functional units, the scheduler must produce assembly language code which is a combination of that shown in Code Examples 3 and 4.
However, the task of simultaneously minimizing the effects of pipeline delay and maximizing the use of functional units is very difficult. Generally, achieving optimal solutions is, at best, computational expensive, or at worst, theoretically impossible. As a result, the requirements of minimizing the effects of pipeline delays and maximizing the use of functional units are not adequately satisfied.
In summary, conventional compilers which require schedulers to both minimize the effects of pipeline delay and maximize the use of functional units are flawed because such compilers do not adequately satisfy either of these two requirements.