1. Field of the Invention:
The present invention relates to an information processing apparatus capable of allowing instructions to be executed flexibly in an appropriate order different from a predetermined order defined in the program.
2. Description of the Prior Art:
Conventionally in a field of information processing art, a pipelined processing in which a plurality of instructions are executed simultaneously or in an overlapped manner is frequently used for improvement of performance of information processing apparatuses.
However, this pipelined processing is disadvantageous when many steps are involved in a pipeline for one instruction, for example, in a data reading-out operation from memories or a floating point arithmetic operation. Because, in the case where data obtained by one instruction is mandatorily required for any succeeding instruction, the succeeding instruction is suspended its execution until the required data is obtained, thus resulting in adversely affecting an overall performance of the information processing apparatus. By the way, such an interrelation that the later instruction requires the data obtained as a result of completion of the former instruction is normally called "data dependency."
On the other hand, there has been already proposed a method that can avoid deterioration of performance to be caused due to data dependency, as disclosed in a technical paper, R. M.Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units: IBM Journal, Vol. 11 Jan. 1967, pp. 25-33.
In accordance with this method proposed by R. M. Tomasulo, in the case where a certain instruction is suspended due to its data dependency nature and if there is any instruction capable of being executed immediately in its succeeding instructions, that succeeding instruction is executed first.
A method for executing instructions in an order different from that being predetermined in the program is called as an "out-of-order execution".
According to the disclosure in the above-introduced technical paper, every instruction to be executed is accompanied by an identification tag, and this tag is outputted together with the result data after the execution of the instruction is completed in an execution (arithmetic) unit.
Furthermore, there are provided reservation stations at respective input terminals of a plurality of execution units. After the instruction is decoded, data required for execution of this instruction are read out from a register file and are stored in the reservation station of the corresponding arithmetic unit together with the decoded instructions.
In the case where the required data are not obtained since the instruction has data dependent nature in connection with a previous instruction, a tag of the previous instruction supplying the required data is stored in the reservation station. In order to take in the required data, the reservation station compares the tag newly outputted from each execution unit and the tags already stored in the reservation station. Then, any instruction that has first fully obtained the required data among a plurality of instructions is executed first.
FIG. 10 is a schematic block diagram showing a conventional information processing apparatus utilizing the reservation station.
In FIG. 10, a register file 101 contains a plurality of registers for storing data. Arithmetic units 102 serve as execution units. And, each arithmetic unit 102 outputs an operation result data on a corresponding result bus 103 and also outputs a tag on a corresponding tag bus 104 so that the outputted result data can be discriminated by the tag; in other words, to indicate which instruction the result data is related to.
In this conventional processing apparatus, there are provided three arithmetic units 102 so that three arithmetic operations can be carried out at the same time. The reservation station comprises a plurality of entry blocks 105. Each arithmetic unit 102 are associated with two entry blocks 105.
Each entry block 105 includes a control field 111 that stores control information obtained after decoding instructions, #1 tag field 112, #1 source field 113, #2 tag field 114, and #2 source field 115. The #1 source field 113 and the #2 source field 115 serve to store two source data to be used in the arithmetic unit 102. The #1 tag field 112 and the #2 tag field 114 associate with the #1 source field 113 and the #2 source field 115, respectively, and store tags so as to identify data supplied from any one of three arithmetic units 102 in the case where no source data is supplied from the register file 101 due to data dependency nature in connection with the previous instruction.
Each entry block 105 has six comparators 116 for comparing the tags outputted from three arithmetic units 102 with the tags stored in the #1 tag field 112 and the #2 tag field 114. If any tag from the arithmetic units 102 coincides with the stored tag in the #1 and #2 tag fields 112, 114, the result data of the corresponding arithmetic unit 102 is stored in the corresponding #1 source field 113 or #2 source field 115. And, then, if two source data in the #1 and #2 source fields 113, 115 are both obtained, their associated arithmetic unit 102 executes its arithmetic operation in accordance with the suspended instruction.
As can be understood from the foregoing description, if the number of the arithmetic units that output result data simultaneously is three and the number of data required for processing each arithmetic operation is two, at least 6 (=3.times.2) comparators are necessary per one entry block of the reservation station in above-described conventional information processing apparatus. If each arithmetic unit is associated with a reservation station containing two entry blocks, total number of comparators should be provided in the information processing apparatus increases up to 36 pieces, as a result.
Furthermore, in order to detect a condition that the data dependent situation is eliminated and find out an instruction to be executed next, it is normally further required to compare tags, thus resulting in slowing down its operation speed.
Moreover, in accordance with the constitution of the above-described conventional information processing apparatus, an execution of instruction is forced to carry out only in its corresponding arithmetic unit nevertheless another arithmetic unit is ready for executing a new arithmetic operation. Hence, even if two source data are obtained, the instruction cannot be executed until the corresponding arithmetic unit becomes available irrespective of availability of other arithmetic units.
If an instruction is carried out by the out-of-order execution and then the data obtained are directly written in a register file, it will cause a problem when an exceptional condition is raised. In the case where the exceptional condition is raised, addresses of instructions that are unable to be completed due to this exceptional condition are all evacuated. After the processing of this exceptional condition has been finished, the execution is again initiated from above evacuated addresses.
However, this out-of-order execution includes the following problem. If the instructions succeeding the evacuated addresses are already completed their executions by the timing that the exceptional condition is raised and further the register file has already been renewed, these succeeding instructions have to be again executed after the processing of the exceptional condition is finished.
In order to solve this problem, an advanced device, reorder buffer, has been proposed, for example, as disclosed in the technical paper, James E. Smith et al, Implementing Precise Interrupts in Pipelined Processors: IEEE Transactions on Computers, Vol. 37, No. 5, May 1988, pp 562-573.
FIG. 11 is a block diagram showing a conventional information processing unit adopting this reorder buffer. In FIG. 11, a reorder buffer 121 includes a plurality of entry blocks 122 each containing a data field 56 storing data and a destination register field 55 storing register numbers; i.e. destination register numbers, that indicate the numbers of registers into which the above data are stored.
When an arithmetic unit 52 outputs a result data that is obtained by the out-of-order execution, the reorder buffer 121 functions to temporarily store this result data and, in turn, transfer this data into a register file 51 in compliance with the instruction written in the program.
Before the data held in the reorder buffer 121 is transferred into the register file 51, if any succeeding instructions will require that data, it becomes now necessary to inhibit this data from being transferred into the register file 51 so that this data can be supplied from the reorder buffer 121 in a phase that the corresponding instruction is executed later.
To this end, each entry block 122 of the reorder buffer 121 is equipped with a dependency detector 57 that compares the destination register numbers stored in the destination register field 55 of the entry block 122 with source register numbers 60 and 61 of succeeding instructions to detect the data dependency.
If the data dependency relationship is recognized, the dependency detector 57 generates data dependency information 64, 65. And then, on the basis of the data dependency information 64, 65, output from the data field 56 of the entry block 122 is controlled.
However, in the case where a plurality of entry blocks 122 in the reorder buffer 121 have the same destination register number, the latest data has to be selected. For this reason, there is normally provided a circuit that functions to prioritize a plurality of data having the same value and select the up-to-date one.
FIG. 12 is one example of such a selecting circuit with prioritization function. The selecting circuit 130 basically acts to cancel a coincident signal having minor priority if the existence of other coincident signal having more higher priority is recognized. Accordingly, the larger the number of the entry blocks becomes the more the number of gate stages increases, resulting in slowing down an overall operation speed.
In a normal application, 15% to 25% of all the instructions to be executed are branch instructions. Furthermore, most of these branch instructions are classified into a conditioned branch instruction whose branching-off operation depends on a result of a judgement whether or not a condition code comes to have a value satisfying the condition designated by the instruction.
In the case where an instruction arrayed immediately before a conditioned branch instruction renews the condition code, the judgement of branching-off operation is delayed. If an instruction to be accessed as a result of branching-off operation is fetched after waiting the completion of the judgement, it results in carrying out useless cycles until the fetch of the accessed instruction has been finished.
In order to reduce these useless cycles, one of prospective methods is to fetch the instruction to be accessed in advance and initiate its execution as soon as the judgement of branching-off operation is completed. However, even in this method, it was not possible to execute the instructions succeeding this branch instruction beforehand unless the branching-off judgement is finished. Thus, an application of out-of-order execution is unexpectedly limited to small number of instructions to be executed between two branch instructions. Accordingly, if frequently the branch instructions are executed, a merit of the out-of-order execution becomes excessively less.
As is apparent from the foregoing description, the conventional information processing apparatus requires many circuits for storing numerous information, for example, a field for storing source data necessary for holding the instructions not having been executed, a field for storing tags necessary for acquiring later the source data not having been obtained immediately. And, the conventional information processing apparatus further requires a great number of comparators for detecting the elimination of data dependency condition by comparing the tag transferred from the arithmetic unit with the tag already stored. Hence, a scale of circuit is inherently enlarged and a processing speed is correspondingly slowed down.
Moreover, in the case where the detection of data dependency is carried out in the buffer that functions to transfer the result data obtained by way of the out-of-order execution into the register file in compliance with an order of instruction defined in the program, if the same register number is registered in a plurality of entry blocks, the prioritization processing among data is additionally required. Therefore, as the number of entry blocks in the buffer increases, delay of detection is further increased.
Still further, until the branching-off condition is satisfied, it is impossible to execute the instructions succeeding that branch instruction. That narrows an applicability of out-of-order execution and, therefore, there was a problem that the effect of out-of-order execution could not be sufficiently obtained.