Recently, a method is adopted in a lot of processors, in which a buffer called a reservation station is provided between an instruction decoder and arithmetic pipelines, an instruction for which execution conditions are satisfied is selected among instructions stored temporarily in the reservation station without relation to an instruction order in the program (called an out-of-order method), and the selected instruction is issued to any one of the multiplexed arithmetic pipelines. In addition, a multi-thread processing method that is a technique to effectively utilize the arithmetic units also begins to be adopted in the processor devices in the market. However, these have following problems.
(1) A method that one reservation station is prepared and the instruction is issued from that to plural pipelines leads to the most efficient utilization of the arithmetic pipelines. However, when extending the range of choices of the instructions to be issued to the pipelines by increasing the number of entries in the reservation station in order to improve the parallelization degree, there is a problem that logic implementation to select plural instructions to be issued from a lot of entries becomes complicated. In order to cope with such complexity, there is a countermeasure in which the number of stages in the arithmetic pipeline is increased, or a countermeasure in which the improvement of the clock speed is suppressed. However, these countermeasures fall in a direction opposite to the performance improvement that is an original purpose.(2) When the reservation station is divided, and the number of entries in one reservation station is limited to such an extent that the instruction to be issued can be selected, the problem (1) can be resolved. However, there is a problem that the range of instruction choices becomes narrow, and accordingly the improvement of the parallelization degree is limited.(3) When adopting a configuration that the reservation station is divided, the arithmetic pipeline in which an arithmetic operation is to be executed is fixed at a stage when an instruction is output from a decoder to the reservation station, in a conventional technique. In such a case, by the relative merit of instruction output destination reservation station determination algorithms in the decoder, bubble occurrences in the arithmetic pipelines differ. Because an effective algorithm is different for each workload, the dynamic optimization is required on each occasion. Incidentally, because the logic of the decoder is originally complicated, the further increase of the complexity makes the capability low.(4) In a processing method called the multi-thread method (a method in which plural jobs, which have no dependency each other, share the arithmetic pipeline.), there is difference of the potential instruction parallelization degree based on difference of the property between jobs. As the result, the frequency that the instruction can be issued from the reservation station to the arithmetic pipeline differs. When there are plural reservation stations and each of the reservation stations is connected with a specific arithmetic pipeline, it is especially necessary in the multi-thread method to appropriately carry out the instruction storage to the reservation station. However, the processing of the decoder in the processor device in this method is more complicated than a conventional method (i.e. a single-thread method), and it is difficult to optimize the instruction output to the reservation station, in which the property of the job is taken into consideration, without changing the number of stages in the arithmetic pipeline. Therefore, a means for optimization at the instruction output to the arithmetic pipeline is required at a side of the reservation station.
Incidentally, for example, US-2003/0014613-A1 discloses a technique to improve the parallelism in a data processing, reduce the waiting time of the instruction execution, and increase the processing speed. Specifically, a data processing system having a decentralized reservation station is provided, and the decentralized reservation station stores a basic block of codes in a microprocessor instruction form. Therefore, the basic block of the codes can be dispersed to several decentralized reservation stations. Thus, the number of entries in each decentralized reservation station is decreased, the waiting time to execute the instruction is reduced, and the processing speed is increased. In this publication, the plural reservation stations are associated with plural arithmetic units, and the algorithm to determine from which reservation station the instruction is output to which arithmetic unit becomes complicated.
In addition, JP-A-2000-181707 discloses a technique to reduce an amount of materials of an instruction processing device enabling the out-of-order instruction execution in order to execute the instruction processing in an information processing apparatus at high speed and to enable high speed operation. Specifically, in an instruction control device of the information processing apparatus, in which a storage means for temporarily storing plural instructions that have been decoded but have not been issued to any execution units is provided, the storage means is configured so that an order of each entry indicates a decoded order of the instructions stored therein, an entry from which an instruction is issued is deleted, and stored information moves between entries so as to configure entries in an order that entries including not issued instructions are consecutive. Then, a movement amount between entries is maximum and is equal to the number of instructions, which can be simultaneously decoded. In this publication, an instruction can be outputted to any execution unit from each entry in the reservation station, and there is a problem that the logic to determine to which execution unit an instruction should be outputted from each entry becomes complicated.
Moreover, U.S. Pat. No. 6,938,150 discloses a technique to efficiently utilize a reorder buffer in a processor that an out-of-order execution is carried out by using the reorder buffer and the like. Specifically, each functional unit such as an arithmetic unit, a store unit, and a load unit uses an entry number (WRB number) of the reorder buffer to notify the end of the processing of the instruction stored in that entry in its own unit to the reorder buffer. However, the load unit manages the latest speculation state of an issued load instruction based on a branch prediction success/failure signal outputted from a branch unit, and as for the load instruction followed by a branch instruction for which the branch prediction is failed, even when the processing is completed, the notification to the reorder buffer by the WRB number is not carried out. Thus, it is said that the reorder buffer can immediately use an entry storing the load instruction followed by the branch instruction for which the branch prediction is failed. This publication indicates an example in which plural functional units for one reservation station are provided. However, it is said that a different reservation station may be provided for each functional unit, and one common reservation station may be provided for each group of several functional units.
US-2002/0019927-A1 discloses an example in which each entry is associated with a specific arithmetic unit.
As described above, when the number of entries in the reservation station increases, the logic to select plural instructions, which satisfy the execution conditions, from there is complicated, and at the implementation, the trade-off with the performance improvement occurs. In addition, in order to efficiently issue the instruction from the plural reservation stations, the high-level dynamic optimization is required at a stage of storing an instruction into the reservation station. This causes the decoder whose implementation has already been complicated to be further complicated.