The present invention generally relates to program processing technology applicable to a compiler for translating all the source code of a program written in a high-level language into object code, and more particularly relates to code optimization technology specially designed for a parallel processor.
As the performance and functions of various microprocessor application components have been increasingly enhanced these days, microprocessors with even higher performance are in higher and higher demand. To meet such a demand, parallel processors like very-long-instruction-word (VLIW) processors have been developed to realize still higher performance through parallel processing.
For example, a VLIW processor is supposed to process in parallel a combination of operations that are packed in each and every word of a program. The combination is made by a compiler, which extracts a number of parallelly executable instructions from a source program and then combines them into a single long instruction word, thereby shortening the time taken for the VLIW processor to execute the program.
According to the compilation technology, instructions are rearranged on a xe2x80x9cbasic blockxe2x80x9d basis, which means a set of instructions to be executed consecutively without any branch or halt. That is to say, a basic block is a collection of instructions controllable continuously from the first through last ones.
A VLIW processor, on the other hand, generally executes instructions included in a long, fixed-length word with a fixed parallelism index. Thus, the code conversion efficiency attainable by the VLIW processor is not always good. To eliminate such a problem, a VLIW processor for executing a variable number of instructions included in a variable-length word was developed recently. In the VLIW processor of this type, a set of instructions to be executed in parallel is divided into a plurality of groups on respective parallel execution boundaries, thereby making the number of instructions issued per cycle (i.e., index of parallelism) variable. In addition, the VLIW processor executes an instruction word of a variable length so as to improve the code conversion efficiency. In this specification, a group of instructions included between an adjacent pair of execution boundaries will be called an xe2x80x9cexecution unit (of instructions)xe2x80x9d. Furthermore, the VLIW processor can also execute a plurality of units of instructions concurrently while branching and recombining the processing flow.
As described above, the processor of this type rearranges the instructions on the basic block basis. Thus, if the compilation technology is applied to the processor of this type, even a set of instructions, which could be included in a single execution unit otherwise, might be unintentionally divided into several execution units on the basic block boundaries. As a result, the program execution rate attainable by such a processor cannot be regarded sufficiently high considering the potential performance of the VLIW processor.
An object of the present invention is increasing the program execution rate of a target machine in performing program processing for parallel processing purposes.
Specifically, an inventive program processing method for parallel processing includes the step of a) subdividing each of a plurality of basic blocks, into which program code has been divided, into a multiplicity of execution units. Each of the execution units is made up of parallelly-executable instructions. The method further includes the step of b) combining two of the execution units, which are located just before and after a basic block boundary, into a single execution unit if these execution units are executable within a single cycle.
According to the present invention, if a pair of execution units, which are located just before and after a basic block boundary, are found executable within a single cycle, then these execution units are combined into a single execution unit. Even a group of instructions ranging across a basic block boundary, or covering multiple basic blocks, are executable by a target machine within a single cycle if these instructions are classified as single execution unit. Accordingly, it is more probable that a group of instructions ranging across a basic block boundary are executable in parallel, thus cutting down the number of cycles to be done by the target machine to execute a program. As a result, the program execution rate increases.
In one embodiment of the present invention, the step b) preferably includes analyzing dependence between instructions belonging to the two execution units located just before and after the basic block boundary. And it is determined based on the analyzed dependence between the instructions whether or not these execution units located just before and after the basic block boundary are combinable into the single execution unit.
In another embodiment of the present invention, an execution boundary code indicating the identity as boundary between an associated pair of the execution units is preferably added to each said basic block boundary in the step a). In the step b), if the execution units located just before and after the basic block boundary are executable within the single cycle, then the execution boundary code that has been added to the basic block boundary is preferably removed.
In still another embodiment, instructions, which impose less strict constraints on resources of a target machine for executing the program, are selected preferentially in the step a) as respective instructions belonging to first and last execution units of each said basic block.
In this particular embodiment, when the instructions belonging to the first and last execution units of the basic block are selected in the step a), an instruction that is executable only by itself a cycle by the target machine is preferably given a lower priority.
In still another embodiment, when instructions belonging to first and last execution units of each said basic block are selected in the step a), an instruction with a short word length is preferably given a higher priority.
Another inventive program processing method for parallel processing includes the step of a) subdividing each of a plurality of basic blocks, into which program code has been divided, into a multiplicity of execution units. Each of the execution units is made up of parallelly-executable instructions. In the step a), a particular one of the basic blocks is preferably subdivided into a set of execution units along with an instruction belonging to one of the other execution units that is closest to the particular basic block. The closest execution unit belongs to another set of execution units, into which another one of the basic blocks that is adjacent to, and combinable with, the particular basic block has already been subdivided.
According to the present invention, a particular basic block is subdivided into a set of execution units along with an instruction belonging to one of the other execution units that is closest to the particular basic block. The closest execution unit belongs to another set of execution units, into which another basic block that is adjacent to, and combinable with, the particular basic block has already been subdivided. Thus, a number of instructions covering multiple basic blocks across a basic block boundary are more likely to be combined into a single execution unit. Even a group of instructions ranging across a basic block boundary are executable by a target machine within a single cycle if these instructions are classified as single execution unit. Accordingly, it is more probable that a group of instructions covering several basic blocks across a basic block boundary are executable in parallel, thus cutting down the number of cycles to be done by the target machine to execute a program. As a result, the program execution rate increases.
In one embodiment of the present invention, one of the basic blocks that is located just before the particular basic block may be used as the combinable basic block in the step a). The particular basic block, along with the instruction belonging to the last execution unit of the combinable basic block, may be subdivided into the set of execution units.
In an alternate embodiment, one of the basic blocks that is located just after the particular basic block may be used as the combinable basic block in the step a). The particular basic block, along with the instruction belonging to the first execution unit of the combinable basic block, may be subdivided into the set of execution units.
In still another embodiment, each said basic block may be subdivided in the step a) into the execution units sequentially in a forward direction from the beginning toward the end of the program.
In an alternate embodiment, each said basic block may be subdivided in the step a) into the execution units sequentially in a backward direction from the end toward the beginning of the program.
In still another embodiment, one of the basic blocks that belongs to the innermost loop is preferably subdivided into the execution units preferentially.
In yet another embodiment, the method may further include the step b) of subdividing each said basic block of the program code into another set of execution units independent of adjacent ones of the basic blocks. Results of the steps a) and b) are compared to each other and one of these steps a) and b) that results in the smaller number of execution units is adopted.
An inventive program processor for parallel processing includes an intra basic block parallelizer for subdividing each of a plurality of basic blocks, into which program code has been divided, into a multiplicity of execution units. Each of the execution units is made up of parallelly-executable instructions. The processor further includes a basic block boundary parallelizer for combining two of the execution units, which are located just before and after a basic block boundary, into a single execution unit if these execution units are executable within a single cycle.
Another inventive program processor is adapted to execute compilation for parallel processing purposes. The processor includes: a compiler front end for translating source code into intermediate code by dividing the source code into a plurality of basic blocks; a parallelizer for converting the intermediate code into code in a parallelly-executable form; and an object code generator for translating the intermediate code in the form converted by the parallelizer into object code executable by a target machine. The parallelizer includes: an intra basic block parallelizer for subdividing each of a plurality of basic blocks, into which the intermediate code has been divided, into a multiplicity of execution units, each being made up of parallelly-executable instructions; and a basic block boundary parallelizer for combining two of the execution units, which are located just before and after a basic block boundary, into a single execution unit if these execution units are executable within a single cycle.
Still another inventive program processor for parallel processing includes an expanded basic block parallelizer for subdividing each of a plurality of basic blocks, into which program code has been divided, into a multiplicity of execution units, each being made up of parallelly-executable instructions. The expanded basic block parallelizer subdivides a particular one of the basic blocks into a set of execution units along with an instruction belonging to one of the other execution units that is closest to the particular basic block. The closest execution unit belongs to another set of execution units, into which another one of the basic blocks that is adjacent to, and combinable with, the particular basic block has already been subdivided.
Yet another inventive program processor is also adapted to execute compilation for the parallel processing purposes. The processor includes: a compiler front end for translating source code into intermediate code by dividing the source code into a plurality of basic blocks; a parallelizer for converting the intermediate code into code in a parallelly-executable form; and an object code generator for translating the intermediate code in the form converted by the parallelizer into object code executable by a target machine. The parallelizer includes an expanded basic block parallelizer for subdividing each of a plurality of basic blocks, into which the intermediate code has been divided, into a multiplicity of execution units, each being made up of parallelly-executable instructions. The expanded basic block parallelizer subdivides a particular one of the basic blocks into a set of execution units along with an instruction belonging to one of the other execution units that is closest to the particular basic block. The closest execution unit belongs to another set of execution units, into which another one of the basic blocks that is adjacent to, and combinable with, the particular basic block has already been subdivided.
An inventive storage medium has stored thereon a program getting a program processing procedure executed by a computer for parallel processing purposes. The program processing procedure includes the steps of: a) subdividing each of a plurality of basic blocks, into which program code has been divided, into a multiplicity of execution units, each being made up of parallelly-executable instructions; and b) combining two of the execution units, which are located just before and after a basic block boundary, into a single execution unit if these execution units are executable within a single cycle.
Another inventive storage medium has stored thereon a program getting a program processing procedure executed by a computer through compilation for parallel processing purposes. The program processing procedure includes the steps of: a) translating source code into intermediate code by dividing the source code into a plurality of basic blocks; b) subdividing each said basic block of the intermediate code into a multiplicity of execution units, each being made up of parallelly-executable instructions; c) combining two of the execution units, which are located just before and after a basic block boundary, into a single execution unit if these execution units are executable within a single cycle; and d) translating the intermediate code in the form converted in the steps b) and c) into object code executable by a target machine.
Still another inventive storage medium has stored thereon a program getting a program processing procedure executed by a computer for parallel processing purposes. The program processing procedure includes the step of subdividing each of a plurality of basic blocks, into which program code has been divided, into a multiplicity of execution units, each being made up of parallelly-executable instructions. A particular one of the basic blocks is subdivided into a set of execution units along with an instruction belonging to one of the other execution units that is closest to the particular basic block. The closest execution unit belongs to another set of execution units, into which another one of the basic blocks that is adjacent to, and combinable with, the particular basic block has already been subdivided.
Yet another inventive storage medium has stored thereon a program getting a program processing procedure executed by a computer through compilation for parallel processing purposes. The program processing procedure includes the steps of a) translating source code into intermediate code by dividing the source code into a plurality of basic blocks; b) subdividing each said basic block of the intermediate code into a multiplicity of execution units, each being made up of parallelly-executable instructions; and c) translating the intermediate code in the form processed in the step b) into object code executable by a target machine. In the step b), a particular one of the basic blocks is subdivided into a set of execution units along with an instruction belonging to one of the other execution units that is closest to the particular basic block. The closest execution unit belongs to another set of execution units, into which another one of the basic blocks that is adjacent to, and combinable with, the particular basic block has already been subdivided.
Yet another inventive storage medium has stored thereon a set of instructions to be executed in parallel. The instruction set is grouped into a plurality of execution units, each being made up of parallelly-executable instructions, and at least one of the execution units is located across a boundary between an associated pair of the basic blocks.