The present invention relates to a method and apparatus that performs instruction level parallel processing. More particularly, the present invention relates to a method and apparatus including a register configuration suited for a software pipeline which is a software technique that makes the instruction level parallel processor function effective, and to a processor having such a register configuration.
The performance of a computer is determined by a machine cycle and the number of clock cycles per instruction (CPI). Hence, it is important to reduce the machine cycle and the CPI in improving the performance of a computer. Conventional techniques for reducing the CPI include organizing instructions into a software pipeline and an instruction level parallel processing as represented by superscalar processing and very long instruction word (VLIW) processing.
The pipeline is a parallel processing technique that divides the execution of a single instruction into a plurality of small stages, such as fetch, decode, execute and write-back, and simultaneously executes a plurality of instructions among the stages. This pipeline theoretically can make the CPI equal to a unit by processing each stage in one machine cycle.
Instruction level parallel processing can execute a larger number of instructions simultaneously by performing the pipeline on a plurality of instructions at once and aims at realizing the CPI of less than 1. The difference between superscalar and VLIW processing in terms of instructions performed at the same time is that for superscalar processing, executable instructions are selected dynamically from an instruction sequence by hardware and then executed, whereas VLIW processing has an instruction format that allows specifying a plurality of operations with a single instruction, with the simultaneously executable operations being statically set in an instruction by the compiler.
In using the processor employing these techniques, what is most important is to arrange instructions by the compiler so as to keep the pipeline filled with valid instructions. This is more important for the VLIW system that do not have the function to select executable instructions dynamically. The software technique for arranging instructions by the compiler in a software pipeline is hereinafter referred to as SWPL. The SWPL divides an instruction sequence of a loop body shown in FIG. 1 into a plurality of lumps (hereinafter referred to as stages of SWPL) based on the dependency relationship between instructions and the number of resources and then puts new iteration stages into the pipeline for each machine cycle called a prologue interval, as shown in FIG. 2. As a result, in an ideal steady state interval, all the stages with different iterations (iteration is a single loop in the loop processing) are executed at once, so a high level of parallelism can be obtained, allowing efficient use of the processor that performs the instruction pipeline and the instruction level parallel processing. Thereafter a decreasing number of iteration stages are executed until the end of the instruction sequence in a epilogue interval.
With the processor that has only ordinary registers, however, when there is a read operation for the same register that has been written into, because the correct order of the read with respect to the write must be guaranteed, it is necessary that the write into the register in the subsequent iteration be scheduled after the read from the register in the current iteration. That is, the write and the read for the same register must be sequenced. Thus, the number of instructions that can possibly be executed simultaneously decreases, giving rise to the problem of reduced effectiveness of the software pipeline.
Regarding this problem, there is disclosed in "A Register File Architecture For Instruction Level Parallel Processing and Its Evaluation" by H. Fuji, et al, Parallel Processing Symposium JSPP'93, pages 307-314, 1993 (referred to as a first conventional technique), a method whereby a physical register of queue structure and a pointer for read/write of physical registers are provided for each logical register specified by an instruction. The pointer is incremented by a register control field in a logical register specification field of the instruction according to whether read or write is specified. By advancing the pointer when writing the data into the register, it is possible to control the register writing associated with the same instruction so that the write can be made to different physical registers at different iterations, which in turn allows the write to the same logical register at a subsequent iteration to be executed prior to the read. "Code Generation Scheme For Modules Scheduled Loops" by B. Ramakrishna, et al., IEEE, pp 158-169, 1992 (referred to as a second conventional technique), proposes a method in which a base register for a physical register is provided, wherein an access is made to the physical register according to a value, which is the sum of the logical register number specified by the instruction and the value of the base register; and wherein the base register is decremented each time a new software pipeline (SWPL) stage is thrown in. In this method, because the write of subsequent iteration is thrown in as a new SWPL stage, even the same logical register numbers have different base register values and can obtain different physical register numbers, thus allowing the write of subsequent iteration to be performed before the read operation.
In the first conventional technique, when ordinary use of register during the execution of instructions is considered, a number of same logical registers are sometimes read out overlapping different SWPL stages in order to reuse as many register values as possible. In that case, this technique performs control to update the pointer by only the last register read and use the same physical register for the remaining accesses. Thus, because the write of subsequent iteration cannot be executed before the last read, the prologue interval increases. In the event that a reuse of register beyond the current SWPL stage occurs, it is necessary to insert a copy instruction to make the interval from register definition to register use less than the initiation interval. This gives rise to a problem of software overhead due to copy instruction and consumption of logical registers for the copy operation.
Further, because each logical register has an independent physical register, a value defined by a certain logical register number cannot be read out by another logical register. Hence, even if there are a sufficient number of physical registers as a whole, only the physical registers associated with the logical registers used by the instructions are used, deteriorating the utilization of the physical registers.
In the second conventional technique, because the field representing a logical register in an instruction must be able to specify all physical registers, the register field increases with the capacity of the physical registers. Another problem is that because the base register and logical register numbers are added up when making a register access, the register access time may increase. The former problem, in particular, becomes large when a VLIW is used because a number of register specifying fields are contained in one instruction. Further, an instruction for updating the base register at each repetition of loop must be added and this results in a software overhead.