The present invention relates to an compiling method for compiling a source program into an object program for a CPUs that support predicated execution.
Programs executed on computers generally contain a large number of conditional branch instructions. A conditional branch instruction is one which changes the address of an instruction to be executed next, depending on whether a given condition is true or false. Usually, in the conditional branch instructions, the state where the condition is true is referred to as xe2x80x9cthe branch is takenxe2x80x9d. In this case, an instruction to be executed after the conditional branch instruction is the instruction in an address which is specified by the operand in that conditional branch instruction, as opposed to the instruction in the address following that of the conditional branch instruction. On the other hand, the state where the condition is false is referred to as xe2x80x9cthe branch is not takenxe2x80x9d. In this case, the instruction to be executed after the conditional branch instruction is the one in the address following that of the conditional branch instruction.
In CPUs that perform pipeline processing, an instruction is fetched from a memory or cache into the CPU several clock cycles before it is executed. Thus, the fetching of an instruction which has turned out to should be executed after the execution of a conditional branch instruction after a decision of whether its condition is true or not was made results in a failure to execute the next instruction in the cycle following the conditional branch instruction cycle. In that case, an idle cycle will be generated in which nothing can be performed. Such an event is referred to as a pipeline hazard, which is one of obstacles to high-speed program execution.
As one of methods to circumvent the pipeline hazard, predicated execution has been proposed (reference 1; xe2x80x9cA Comparison of Full and Partial Predicated Execution Support for ILP Processorsxe2x80x9d, by Scott Mahlke, Proc. Of ISCA ""95, pp. 138-149).
CPUs that support predicated execution differ from usual CPUs (i.e., CPUs that do not support predicated execution) in the following two points.
There is a predicate mode the CPU controls and a predicate mode set instruction is supported.
An opcode has a predicate field. The instruction is executed only when a coincidence occurs between the mode indicated by a value described in the predicate field and the predicate mode controlled by the CPU.
The utilization of the predicated execution allows conditional branch instructions in usual CPUs and instructions executed depending on whether their conditions are met to be modified as follows:
An conditional branch instruction is changed to an execution mode set instruction, which sets the CPU mode to xe2x80x9caxe2x80x9d when the condition is true and to xe2x80x9cbxe2x80x9d when the condition is false.
With a sequence or group of instructions xe2x80x9cAxe2x80x9d to be executed when the branch is taken (i.e., when the condition is true), their respective predicate field value is specified to be a. With a sequence or group of instructions xe2x80x9cBxe2x80x9d to be executed when the branch is not taken (i.e., when the condition is false), their respective predicate field value is specified to be b. These instructions are allocated after the execution mode set instruction. An instruction in which the predicate field value is xe2x80x9caxe2x80x9d and an instruction in which the predicate field value is xe2x80x9cbxe2x80x9d may be mixed. The xe2x80x9caxe2x80x9d and xe2x80x9cbxe2x80x9d each take some numerical values.
The pipeline hazard can be circumvented by the utilization of the predicated execution because conditional branch instructions can be removed from a program.
Such predicated execution is expected to provide greater benefits especially in CPUs having VLIW (Very Long Instruction Word) architecture. Here, VLIW refers to an architecture in which a CPU contains multiple functional units which operate concurrently (reference 1: xe2x80x9cA Compiler for VLIW Architecturesxe2x80x9d by Ellis, J. R., Bulldog, The MIT Press). The VLIW, which has the capability to execute two or more instructions at the same time, permits the speed of execution of a program to be increased, provided that instructions can be allocated simultaneously to many functional units. With the predicated execution, it is easy for many functional units to be filled with instructions because, as described previously, instructions can be allocated from both a sequence or group of instructions xe2x80x9cAxe2x80x9d which are executed when the branch is taken and a sequence or group of instructions xe2x80x9cBxe2x80x9d which are executed when the branch is not taken. For the VLIW in particular, therefore, the predicated execution is a promising means for speeding up the execution of a program.
However, the degree to which the program execution speed is increased depends on how to allocate the sequence of instructions xe2x80x9cAxe2x80x9d and the sequence of instructions xe2x80x9cBxe2x80x9d to the functional units. The establishment of a compiling method which allows for higher program execution speed is a problem.
As described above, the predicated execution does not suffer the pipeline hazard because no conditional branch is performed and the possibility therefore exists that a program may be executed at high speed. However, there exists heretofore no compiling method which allows for program execution at higher speed.
It is therefor an object of the present invention to provide a compiling method which permits a CPU adapted for predicated execution to execute a program at high speed.
The invention is intended for a compiling method for compiling a source program into an object program for a CPU having multiple functional units that allow for concurrent operations and having a function of executing an instruction only when there is a predetermined relationship between an execution mode indicated by a value in a specific field in an instruction code and an execution mode managed within the CPU.
According to a first aspect of the present invention there is provided a compiling method comprising the steps of: analyzing the source program and generating intermediate codes; making an analysis of the intermediate codes; and allocating instructions from the intermediate codes based on the analysis, wherein an execution mode setting instruction to set an execution mode managed within the CPU, is allocated, instructions, such that whether the instructions are to be executed or not to be executed depends on the execution mode set by the execution mode setting instruction, are allocated, the instructions, in which values in their respective specific fields are identical, make an block together for every value in the specific field; an ending part of the block in which its last instruction is allocated is found for each block; and when the ending part of a certain block is to be earlier in the object program than the ending part of another block, an unconditional branch instruction identical in specific field value to the instructions in the certain block, is allocated either to be executed in the ending part of the certain block or to be executed as immediately as possible after the ending part of the block; whereby the object program is generated from the allocated instructions.
According to a second aspect of the present invention there is provided a compiling method comprising the steps of: analyzing the source program and generating intermediate codes; making an analysis of the intermediate codes; and allocating instructions from the intermediate codes based on the analysis, wherein an execution mode setting instruction to set an execution mode managed within the CPU, is allocated, instructions such that whether the instructions are to be executed or not to be executed depends on the execution mode set by the execution mode setting instruction are allocated, and wherein a decision of whether an instruction that is executed only when an execution mode set by the execution mode setting instruction is a certain specific mode can be allocated so that the instruction may be executed before the execution mode setting instruction or not is affirmative, the instruction is translated into an instruction which is executed regardless of the execution mode and the instruction is allocated to be executed before the execution mode setting instruction; whereby the object program is generated from the allocated instructions.
According to a third aspect of the present invention there is provided a compiling method comprising the steps of: analyzing the source program and generating intermediate codes; making an analysis of the intermediate codes; allocating instruction from the intermediate codes based on the analysis, wherein an execution mode setting instruction to set an execution mode managed within the CPU, is allocated, instructions, such that whether they are to be executed or not to be executed depends on the execution mode set by the execution mode setting instruction are allocated, the instructions, in which values in their respective specific fields are identical, make an block together for every value in the specific field, blocks each other are compared for their contents, and when it is found that an equivalent instruction is present in all the block, the instruction is translated into an instruction that is executed regardless of the execution mode; whereby the object program is generated from the allocated instructions.
According to a fourth aspect of the present invention there is provided a compiling method comprising the steps of: analyzing the source program and generating intermediate codes;
making an analysis of the intermediate codes;
allocating instructions from the intermediate codes based on the analysis, wherein an execution mode setting instruction to set an execution mode managed within the CPU, is allocated, an execution mode which is high in execution count is identified, an instruction corresponding to the execution mode identified to be high in execution count is allocated so that the instruction is executed without branching from the execution mode setting instruction and an instruction corresponding to the other execution mode is allocated so that the instruction is executed with branching from the execution mode setting instruction, from the intermediate codes; whereby the object program is generated from the allocated instructions.
In this case, it is recommended that the instruction corresponding to the other execution mode be allocated to a location to which a branch is caused, and an unconditional branch instruction to make a jump to the beginning of an instruction that follows the final instruction corresponding to the execution mode identified to be high in execution count be allocated at the ending location of the instruction corresponding to the other execution mode.
According to a fifth aspect of the present invention there is provided a compiling method comprising the steps of: analyzing the source program and generating intermediate codes; making an analysis of the intermediate codes; allocating instructions from the intermediate codes based on the analysis, wherein an execution speed at which, when instructions are allocated to be executed only in their respective specific execution mode, the CPU would execute the instructions is compared with an execution speed at which, when instructions are allocated to be executed regardless of the execution mode, the CPU would execute the instructions; and where the comparison shows that the execution speed for the instructions to be executed regardless of the execution mode is higher than for the instructions to be executed only in their respective specific execution mode, instructions to be executed are allocated regardless of the execution mode from the intermediate codes; whereby the object program is generated from the allocated instructions.
Each of the methods described above may further comprises the steps of identifying an execution mode which, of execution modes, is high in execution count, and allocating an instruction in the block corresponding to the execution mode of high execution count, to be executed earlier in the object program on a preferential basis.
Each of the methods described above may be implemented such that, even if execution mode-dependent execution becomes unnecessary, no instruction to reset the execution mode to an initial state is allocated.
In each of the methods, the execution mode set instruction is set an execution mode according to whether the branch condition is true or not. The predetermined relationship is such that, when the value in a specific field in an instruction code is equal to a predetermined specific value, the corresponding instruction is executed regardless of execution modes managed within the CPU, otherwise the instruction is executed only when a coincidence occurs between the execution mode indicated by the value in the specific field and an execution mode managed within the CPU.
In the method according to the first aspect of the invention, since an unconditional branch instruction is appropriately generated and allocated, it becomes possible to prevent an idle cycle or cycles from being executed. Thus, an object program can be run at high speed on a CPU that performs predicated execution. In other words, a high-speed executable object program can be generated.
In the method according to the second aspect of the invention, since a possible instruction can be allocated prior to the execution mode set instruction, the number of program cycles can be reduced. Thus, a object program can be run at high speed oh a CPU that performs predicate execution. In other words, a high-speed executable object program can be generated.
In the method according to the third aspect of the invention, since instructions common to blocks or instruction sequences are changed into a single instruction that is executed regardless of the execution mode, the number of instructions that make up a object program and the number of program cycles can be reduced. Thus, the object program can be run at high speed on a CPU that performs predicate execution. In other words, a high-speed executable object program can be generated.
In the method according to the fourth aspect of the invention, since an instruction of high execution count can be executed without branching. Thus, an object program can be run at high speed on a CPU that performs predicate execution. In other words, a high-speed executable object program can be generated.
In the method according to the fifth aspect of the present invention, instructions can be allocated in accordance with a method by which high execution speed can be expected. Thus, an object program can be run at high speed on a CPU that performs predicate execution. In other words, a high-speed executable object program can be generated.
Further, the invention may also be implemented in the form of a memory storing computer-executable program code. For example, a memory storing computer-executable program code, which corresponds to the compiling method according to the first aspect of the invention, comprising: means for causing a computer to analyze the source program and generate intermediate codes; means for causing a computer to make an analysis of the intermediate codes; means for causing a computer to allocate instructions from the intermediate codes based on the analysis, wherein an execution mode setting instruction to set an execution mode managed within the CPU, is allocated, instructions, such that whether the instructions are to be executed or not to be executed depends on the execution mode set by the execution mode setting instruction, are allocated, the instructions, in which values in their respective specific fields are identical, make a block together for every value in the specific field, an ending part of the block in which its last instruction is allocated is found for each block; and when the ending part of a certain block is to be earlier in the object program than the ending part of another block, an unconditional branch instruction identical in specific field value to the instructions in the certain block, is allocated either to be executed in the ending part of the certain block or to be executed as immediately as possible after the ending part of the block; whereby the object program is generated from the allocated instructions.
Likewise, the compiling methods according to the second to fifth aspects of the invention may be implemented in the form of a memory storing computer-executable program code.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.