1. Field of the Invention
The present invention generally relates to program counter control methods and processors, and more particularly to a program counter control method for simultaneously updating and controlling a program counter and a next program counter which are designed so that a plurality of instructions including branch instructions are completed simultaneously in an instruction control which makes a branch prediction and uses a delay instruction for branching, and to a processor which employs such a program counter control method.
2. Description of the Related Art
Recently, various instruction processing methods are employed in order to improve the performance of the processor. An out-of-order processing method is one of such instruction processing methods. In the processor which employs the out-of-order processing method, a completion of one instruction execution is not waited and subsequent instructions are successively inserted into a plurality of pipelines to execute the instructions, so as to improve the performance of the processor.
However, in a case where execution of a preceding instruction affects execution of a subsequent instruction, the subsequent instruction cannot be executed unless the execution of the preceding instruction is completed. If the processing of the preceding instruction which affects the execution of the subsequent instruction is slow, the subsequent instruction cannot be executed during the processing of the preceding instruction, and the subsequent instruction must wait for the completion of the execution of the preceding instruction. As a result, the pipeline is disturbed, and the performance of the processor deteriorates. Such a disturbance in the pipeline is particularly notable in the case of a branch instruction.
The branch instructions include conditional branch instructions. In the case of the conditional branch instruction, if an instruction exists which changes the branch condition (normally, a condition code) immediately prior to the conditional branch instruction, the branch does not become definite until this instruction is completed and the branch condition becomes definite. Accordingly, because the sequence subsequent to the branch instruction is unknown, the subsequent instructions cannot be executed, and the process stops to thereby deteriorate the processing capability. This phenomenon is not limited to the processor employing the out-of-order processing method, and a similar phenomenon occurs in the case of processors employing processing methods such as a lock step pipeline processing method. However, the performance deterioration is particularly notable in the case of the processor employing the out-of-order processing method. Hence, in order to suppress the performance deterioration caused by the branch instruction, a branch prediction mechanism is normally provided in an instruction control unit within the processor. The branch prediction mechanism predicts the branching, so as to execute the branch instruction at a high speed.
In the case of a processor employing the out-of-order processing method and provided with the branch prediction mechanism, a plurality of branch instructions are inserted into an executing pipeline based on a result of the branch prediction. When the branch instruction branches, a branching destination address needs to be set in an instruction address register. In a processor employing a SPARC architecture, this instruction address register is called a program counter and a next program counter. If a plurality of branch instructions exist in the executing pipeline, the instruction address register needs to hold the branching destination address of each branch instruction until the branch instruction is completed. However, a timing at which the branching becomes definite differs for each branch instruction. For this reason, conventionally, it was necessary to also hold the branching destination address of the branch instruction which actually does not branch.
A throughput of the executing pipeline is determined by a throughput of a branch instruction controller and a number of branching destination address register which holds the branching destination address. However, when the branching destination address register is used by the branching destination address of the branch instruction which actually does not branch, the throughput of the branch instruction is suppressed as a result. For this reason, it becomes necessary to further increase the number of branching destination address registers to improve the throughput of the branch instruction, but the increase in the number of branching destination address registers consequently suppresses the throughput of the branch instruction, thereby generating a vicious circle.
In an instruction control unit, a number of instructions that may be process in one cycle is one of factors determining an execution speed of the instruction control unit. In the instruction control unit employing the out-of-order processing method, it is possible to complete a plurality of instructions simultaneously. Normally, the completion of an instruction indicates a point in time when updating of resources that are used, such as registers, is completed. But when completing a plurality of instructions simultaneously, it is necessary to simultaneously complete the updating of the resources that are used. Hence, the instruction address register also needs to by updated by an amount corresponding to the plurality of instructions. When controlling an architecture which uses delay instructions for branching, typified by the SPARC architecture, the execution of the delay instruction is determined by whether or not the branch instruction branches, and it is necessary to update two registers, namely, the program counter and the next program counter. For this reason, it was conventionally only possible to complete the branch instruction (commit) only by itself or from a predetermined position (relative position with respect to another instruction which is completed simultaneously). Normally, in a decode cycle, the position where the branch instruction is completed (committed) is also determined in a case where the branch instruction in packet form is completed (committed) by placing the branch instruction at the last position of the packet. In this case, the decode cycle and an instruction complete (commit) cycle are restricted by the branch instruction.
Recently, it has become possible to use memories having extremely large memory capacities, due to improvements in the LSI production techniques and the like, and thus, it has become possible to use 64-bit structures for operating systems and applications. Hence, the 64-bit structure is also required of the instruction control unit. However, when the 64-bit structure is used, the scale of the required circuits such as registers becomes large. In addition, registers related to the control of the branch instruction also need to have the 64-bit structure, and the scale of the branching destination address register and the like also becomes large.
When the circuits are simply modified from the 32-bit structure to the 64-bit structure, the required circuits become doubled while the number of entries remains unchanged. As a result, there was a problem in that the circuit scale (assembling area) greatly increases when the 64-bit structure is used.