The present invention generally relates to a microprocessor with very long instruction word (VLIW), superscalar or out-of-order completion architecture. More particularly, the present invention relates to program translator and processor realizing parallel processing down to the level of individual instructions by making efficient use of execution units.
In recent years, various microprocessors, such as VLIW, superscalar and out-of-order completion types, have been developed one after another to execute multiple instructions at a time more rapidly.
Some of compilers, which designate a VLIW microprocessor as a target, define an instruction set and then parallelize the instructions included in the set in such a manner as to satisfy various constraints concerning the availability of execution units of the microprocessor or instruction slots of a long instruction word.
A program translator of this type is disclosed, for example, in Japanese Laid-Open Publication No. 5-265769.
If a source program shown at the top of FIG. 6 is compiled using a prior art program translator, an instruction set shown in the middle of FIG. 6 is generated from the source program. Next, the instructions included in this instruction set are parallelized to generate a set of long instruction words with a step number of 2 as shown at the bottom of FIG. 6. In the second instruction slot of each long instruction word, a no-operation instruction (NOP) is inserted.
Also, if a program shown in FIG. 25 is executed using a conventional superscalar processor, then the processor executes the instructions in 5 cycles by pipelining shown in FIG. 34.
Furthermore, if a program shown in FIG. 31 is executed using another conventional processor including a multiplier that can perform multiplication in 3 cycles, then the processor executes the instructions in 7 cycles by pipelining shown in FIG. 35.
The prior art program translators, however, have various shortcomings. For example, an instruction set generated from source program is not always executable at a high parallelism level because some constraints are often imposed by a processor with limited execution units as targets. Accordingly, many NOP""s should be inserted to parallelize the instructions, thus constituting a serious obstacle to performance enhancement.
Also, in the prior art superscalar processor, even if multiple instructions are decoded at a time, just part of these instructions are executable because available execution units are limited. Thus, the resultant performance is not fully satisfactory, either.
Furthermore, in still another prior art processor, if an execution unit should perform a sequence of operations each taking several clock cycles to execute, then succeeding operations cannot be started until these operations are completed. As a result, the performance of such a process is not so good.
An object of the present invention is providing a program translator that can obtain a set of instructions that have been parallelized to a high level of parallelism.
Another object of the present invention is providing a processor that can perform computational processing rapidly by making more efficient use of execution units.
To achieve these objects, according to the present invention, if there are two instructions that designate the same execution unit as their target, then one of the two instructions is replaced with another instruction that designates a different execution unit.
A program translator according to the present invention includes instruction exchanging means for exchanging one of instructions included in a program for another instruction.
The latter instruction specifies an operation equivalent to that specified by the former instruction and designates, as a target of the operation, an execution unit that is different from an execution unit designated as a target by the former instruction. The program translator further includes instruction parallelizing means for placing the instructions in the program, in which the former instruction has been exchanged for the latter instruction by the exchanging means, at such locations as being parallelly executable by a processor.
In one embodiment of the invention, the exchanging means may include equivalent instruction storage means for storing multiple instructions that specify equivalent operations but designate mutually different execution units as targets of the operations; instruction identifying means for identifying at least one of the instructions included in the program with one of the instructions stored on the storage means; and instruction replacing means for replacing the at least one instruction, which has been identified by the identifying means, with another one of the instructions that is also stored on the storage means but is different from the at least one instruction.
In another embodiment of the present invention, the program translator may further include parallelism-level calculating means for calculating a parallelism level of the instructions that have been parallelized by the instruction parallelizing means.
In still another embodiment, the instruction exchanging means may include equivalent instruction set storage means for storing multiple instruction sets specifying mutually equivalent operations. If two of the instruction sets each designate the same set of execution units as targets of their operations in the same order, these two instruction sets belong to the same group of instructions. The instruction exchanging means may further include: instruction subset identifying means for identifying a subset of the program with one of the instruction sets stored on the storage means;instruction group selecting means for selecting an instruction group that is different from a group to which the instruction set, identified by the identifying means with the instruction subset, belongs; and instruction set replacing means for replacing the instruction subset, which has been identified by the identifying means, with an instruction set included in the instruction group, which has been selected by the selecting means.
Another program translator according to the present invention includes: instruction parallelizing means for generating a set of parallelized instructions by placing instructions at such locations as being parallelly executable by a processor; equivalent instruction storage means for storing multiple instructions that specify equivalent operations but designate mutually different execution units as targets of the operations; no-operation instruction finding means for finding a no-operation instruction from the parallelized instructions located in a predetermined range of the parallelized instruction set; substitute instruction selecting means for selecting, if one of the parallelized instructions including the no-operation instruction found is the same as one of the instructions stored on the storage means, a substitute one of the instructions, which is also stored on the storage means but is different from the instruction included in the parallelized instructions; and instruction replacing means for replacing the instruction included in the parallelized instructions with the substitute instruction selected by the selecting means.
In one embodiment of the present invention, the program translator may further include: effective range searching means for searching the parallelized instruction set for a subset of instructions, which does not cause register conflict with any of the parallelized instructions; and second no-operation instruction finding means for finding a no-operation instruction from parallelized instructions included in the instruction subset that has been found by the searching means. The replacing means replaces the no-operation instruction, which has been found by the second finding means, with the instruction that has been selected by the selecting means.
A processor according to the present invention includes: a first execution unit; a second execution unit; and instruction parallelizing/executing means for executing two instructions, which both designate the first execution unit as a target, in parallel by allocating one of the two instructions to the second execution unit.
In one embodiment of the present invention, the parallelizing/executing means may include: instruction recognizing means for recognizing the two instructions as instructions both designating the first execution unit as the target; allocation changing means for allocating one of the two instructions that designate the first execution unit as the target to the second execution unit; and parallel executing means for executing the two instructions in parallel.
Another processor according to the present invention includes: a first execution unit, on which an operation will be performed in a first number of cycles; at least one second execution unit, on which an operation will be performed in a second number of cycles, the second number being smaller than the first number; instruction recognizing means for recognizing a predetermined instruction as an instruction designating the first execution unit as a target of the operation; and instruction exchanging means for exchanging the predetermined instruction that has been recognized by the recognizing means with at least one instruction that specifies an operation equivalent to that specified by the instruction and designates the second execution unit as a target.
In one embodiment of the present invention, the exchanging means includes: instruction set searching means for searching for an instruction set that specifies an operation equivalent to that specified by the predetermined instruction; comparing means for comparing a point in time execution of the instruction set found by the searching means is completed to a point in time execution of the predetermined instruction is completed; and instruction replacing means for replacing, if the comparing means has determined that the execution of the instruction set will be completed earlier than that of the predetermined instruction, the predetermined instruction with the instruction set.
According to the present invention, the program translator replaces an instruction in question, which is included in an original program, with a substitute instruction that specifies the same operation but designates a different execution unit as a target. Thus, even if the instruction in question and the other instructions included in the same program cannot be executed in parallel because all of these instructions specify the same execution unit as their target, the substitute instruction and the remaining instructions are executable parallelly. This is because the execution unit designated by the substitute instruction is different from that designated by the remaining instructions. As a result, the number of parallelly executable instructions can be increased, while the number of no-operation instructions can be reduced. In this manner, a set of parallelized instructions can be generated to a higher level of parallelism.
In addition, according to the present invention, multiple instructions included in a single set are replaced at a time with corresponding instructions specifying equivalent operations. Accordingly, even if instructions in a set cannot be replaced one by one, the instruction set can be exchanged for a substitute instruction set, which designates an execution unit different from that designated by the other instruction sets, thereby increasing the number of parallelly executable instructions. In this manner, the number of no-operation instructions included in a parallelized instruction set can be reduced and the parallelism level thereof can be increased.
Suppose there are two parallelized instructions, one of which includes a first instruction and a no-operation instruction and the other of which includes a second instruction and a no-operation instruction. In such a situation, the present invention combines these two parallelized instructions into one consisting of the first and second instructions, thereby reducing the number of parallelized instructions and increasing the parallelism level.
Two instructions designating the same execution unit as their target cannot be executed in parallel, generally speaking. In the processor according to the present invention, however, one of these two instructions is replaced with a substitute instruction designating an originally free execution unit. Accordingly, the number of parallelly executable instructions and the program processing speed can be both increased.
In general, while an instruction of the type taking several clock cycles to complete is being executed using a first execution unit, the first execution unit is not available for a next instruction, and therefore the execution of the next instruction should usually be suspended. In contrast, according to the processor of the present invention, the next instruction is replaced with a substitute instruction specifying a second execution unit. In this manner, these two instructions can be executed in parallel, thus speeding up the program processing.