1. Field of the Invention
The present invention relates to a program execution method and a device using the same. In particular, the present invention relates to a program execution method which uses data holding unit based on a general-purpose register and to a program execution device using said method. The present invention can, for example, be applied in a microprocessor which uses a pipeline method.
2. Description of the Related Arts
A RISC (Reduced Instruction Set Computer) single-chip microprocessor is a device simultaneously offering high processing performance in all types of usage, low power consumption and small system area. This type of microprocessor often uses a pipeline method of internal data processing, in which processing is divided into multiple stages and multiple instructions are simultaneously processed in different stages, thereby increasing processing speed. Therefore, the consideration of branching instructions which disrupt the flow of pipeline processing is recognized as an important aspect of microprocessor design.
FIG. 3 shows pipeline processing in a generally used conventional microprocessor. In FIG. 3, processing is at a different stage at each clock and, in the example depicted, 1 instruction is processed completely in 5 stages. These 5 stages are expressed as IF, ID, EX, MEM, WB, and respectively denote: Instruction Fetch, Instruction Decode (and read-out from register), Computation, Memory Access, and Write-Back of necessary data to the register.
Now let us assume that instruction 1 is a branching instruction which generates branching when data in a general-purpose register referred to in the instruction is zero. Ordinarily, the EX stage determines if the data in the general-purpose register is zero. Consequently, when branching is actually generated, the branching destination address is fetched from the instruction memory by an instruction whose IF stage (shaded in the diagram) commences simultaneously with the end of the EX stage of instruction 1 (shaded in the diagram), namely instruction 4. Therefore, in this configuration, branching delay is 2 cycles. "Branching delay" denotes the number of cycles (in this example, cycles 2 and 3) between a cycle which processes a branching instruction and a cycle which actually commences processing the instruction at the branching destination. When there are many such cycles, the penalty for branching increases, hindering high-speed processing.
A technique for reducing branching delay was proposed in "Computer Architecture--A Quantitative Approach to Design, Realization and Evaluation" (David A. Patterson and John L. Hennessy, Nikkei BP Publications), pages 262.about.264. FIG. 4 shows a virtual microprocessor DLX branching determining circuit as discussed in the above publication. In DLX, a zero determining portion 6 is provided for determining whether or not register data between a group of registers 2 and a computing unit 4 is zero; a branching determining portion 8 is provided for determining if branching has occurred based on the result of the zero determining portion. The determination result is sent to a circuit comprising an IF stage and the address of the next instruction to be fetched is specified.
In this configuration, data is read out from the register during the ID stage and the computation unit 4 computes the data during the EX stage. So far, this is identical to the conventional method. However, it is the zero determining portion 6, and not the computation unit 4, which determines if the register data is zero. The zero determining portion 6 is provided specifically for this function, and determining is thus completed during the ID stage. As a result, fetching of the instruction at a branching destination address can commence simultaneously with the end of the ID stage, without waiting for the end of the EX stage. FIG. 5 is a diagram showing the pipeline processing in a DLX microprocessor when a program branches. Here, a branching destination address is fetched in the IF stage (shaded) of instruction 3 which commences when the ID stage (shaded) of instruction 1 ends. Branching delay can thereby be reduced by 1 cycle.
DLX has a branching delay of 1 cycle, which is regarded as the minimum delay possible with an ordinary pipeline-system microprocessor. However, even with the same branching delay, how high the clock frequency can be raised with the given circuit configuration is another matter. Even when branching delay is reduced to 1 cycle, the overall performance will lower if clock frequency has to be lowered by 10% due to the critical path resultantly created. In the case of DLX, a zero determining portion 6 is provided to the ID stage as shown in FIG. 4, but since this determining circuit determines the zero states of 32 bits of data input thereto, this naturally causes delay. Attempting to conclude the determining of zero and branching in the ID stage inevitably lowers the maximum clock frequency.