Field of the Invention
The present invention relates to a technique for reducing occurrence of stalling in a pipeline, which may be observed in a processor performing out-of-order execution. More specifically, the present invention relates to a technique for reducing the occurrence of stalling during simultaneous execution of a plurality of threads in a simultaneous multithreading (SMT) technique.
Description of the Related Art
Many high-performance processors use out-of-order execution to improve instruction execution efficiency. In out-of-order execution, the instructions are processed in data order, that is, the order prepared in a register of the processor by operands and not data. An out-of-order processor modifies the order of the execution results afterwards, so same results are obtained as when the instructions are executed in order.
However, even an out-of-order processor sometimes executes instructions in an order which damages data dependencies and the processing stalls. The cause is hardware constraints on checking data dependencies which is an upper limit on an instruction window, and there is an upper limit on the complexity of dependencies that can be processed at high speeds. These problems can be reduced by using software to optimize the code, but data dependencies are naturally difficult to analyze across functions and across elements when operated using a large number of software element combinations.
There are many prior art techniques for solving the problem of stalls in a pipeline. U.S. Patent Application Publication No. 2010/0017582 discloses a technique in which a simultaneous multithreading processor synchronizes thread selection priorities for selecting thread instructions between a plurality of determination points in a plurality of pipelines inside a processor, thereby improving the performance of the overall system and reducing power consumption.
U.S. Patent Application Publication No. 2008/0263325 discloses a technique in which a long-latency instruction is identified in a first thread analysis as an instruction which may cause a pipeline stall, and the long latency is hidden by inserting a thread-switching instruction after the identified instruction has been executed.
U.S. Patent Application Publication No. 2006/0179280 discloses a technique in which a simultaneous multithreading processor calculates the data dependencies for the instructions from each thread, determines an execution priority for each instruction, and selects the instructions to be dispatched based on the determined execution priorities in order to perform stall-free execution of instructions.
Japanese Laid-open Patent Publication No. 8-147165 discloses a technique in which a processor supporting multiple contexts simultaneously executes a plurality of contexts by executing the instructions in the context of the pipeline and switches to another context during execution when an empty pipeline has been detected. More specifically, Japanese Laid-open Patent Publication No. 8-147165 discloses a technique in which attribute information calling for an instruction fetch from another context during execution of each instruction is provided in an attribute information field of a preceding instruction code having a latency interval with the respective instructions as information required to execute loading instructions and branching instructions during an opportunity for context switching.
The technique disclosed in U.S. Patent Application Publication No. 2010/0017582 is able to suppress the execution of threads likely to stall, and can improve CPU execution efficiency by executing instructions in other threads. However, this technique cannot suppress stalls caused by the execution of instructions in an order which damages data dependencies. The techniques disclosed in U.S. Patent Application Publication No. 2008/0263325 and Japanese Laid-open Patent Publication No. 8- 147165 are triggered by a thread and introduce an instruction from another thread in order to hide latencies in executed instructions. However, these techniques cannot actually suppress stalls.
The technique disclosed in U.S. Patent Application Publication No. 2008/0263325 can prevent stalls by calculating the data dependencies of an instruction and not introducing the instruction before the input value has been calculated. However, in U.S. Patent Application Publication No. 2008/0263325, an instruction inserted into the pipeline that cannot be executed is determined based on the register dependencies of the instruction. As a result, stalls caused by data dependencies other than register dependencies cannot be suppressed.