1. Field of the Invention
The present invention relates to the apparatus for providing an instruction for an information processing device, and especially for an instruction executing unit.
2. Description of the Related Art
In an information processing device operated with an advanced instruction processing system after the pipeline processing system, the performance has been improved by processing the subsequent instructions speculatively without waiting for the completion of the execution of one instruction. It is obvious that the performance has also been improved by supplying an instruction (instruction fetch) speculatively before the instruction execution.
For example, the required and satisfactory instruction supply ability has been reserved for an execution unit by closely associating an instruction fetch with a branch prediction system and configuring an instruction fetch unit completely separated from an instruction execution unit, and the performance having high instruction fetch has been reserved.
FIGS. 1A, 1B and 2A show conventional problems.
A branch predicting operation normally takes some time from an instruction fetch request. Therefore, a predicted branch target instruction fetch request issued when a selection of a branch is estimated from the original request is delayed (for example by 3τ (τ indicates a machine cycle), etc.). This process is shown in FIG. 1A.
That is, with the configuration of the conventional instruction buffer, when a branch selection prediction is frequently made, a time loss in predicting a target branch instruction is apparent. Especially, in a busy short loop in which several instructions forms a loop instruction sequence as shown in FIG. 1B, the loss is observed for each loop as shown in FIG. 2, thereby having a large impact on the performance.
For example, in the case of the instruction sequence (filling memory with constants) shown in FIG. 1B, writing 1000 bytes makes 250 loops. Therefore, the instruction execution unit uses the super-scalar system. In this system, if the instruction sequence can be processed in 1τ (simultaneously for four instructions), then 2τ is completely lost per loop by a target instruction fetch wait if the reference penalty of a branch prediction is 3τ. Therefore, it takes 750τ using the short loop while a 1000 byte write can be performed by a high performance instruction execution unit in 250τ without the short loop.
To solve the above mentioned problem, an instruction code in which software (compiler) has developed a short loop is prepared in advance so that a branch prediction loss can be concealed although a repeated process appears in the conventional technology. However, in this method, the instruction code involuntarily increases, and there arises a laborious process of reconstructing prepared software.