1. Field of the Invention
The present invention generally relates to computer processing techniques and, in particular, to a superscalar processing system and method that executes instructions in an in-order fashion and that independently stalls processing of the instructions, when necessary.
2. Related Art
In most early computer processors, a pending instruction of a computer program was fully executed, and the results of the execution were written to a register or a location in memory before commencing execution of the next instruction of the program. The execution of the instructions occurred one at a time, and, therefore, errors from data dependency hazards could be easily prevented.
As used herein, a xe2x80x9cdata dependencyxe2x80x9d refers to a situation where a first instruction during execution generates or retrieves data that is needed for the execution of a second instruction. In such a situation, a data dependency is said to exist between the two instructions, and the timing of the execution of the instructions must be controlled such that the needed data produced by execution of the first instruction is available during execution of the second instruction. If steps are not taken to ensure that the first instruction will always execute before the second instruction, then a data dependency hazard exists. A xe2x80x9cdata dependency hazardxe2x80x9d refers to a situation in which an error is possible because an instruction dependent on data from another instruction may execute before the other instruction and, therefore, may utilize incorrect data during execution.
When a processor executes instructions of a computer program one at a time, as described above, preventing errors from data dependency hazards is relatively simple. In this regard, as long as each instruction dependent on data from another instruction is input into the processor after the other instruction, errors caused by data dependency hazards should not occur.
However, to increase the performance of many processors, pipeline processing was developed. In pipeline processing, a single pipeline simulataneously processes multiple instructions. Therefore, execution of one instruction in the pipeline may be commenced before the results of execution of a preceding instruction in the pipeline are available. Consequently, with pipeline processing, errors from data dependency hazards are possible.
Most pipeline processors utilize a control mechanism to prevent errors from data dependency hazards. The control mechanism detects data dependencies that exist between instructions input into the processor. During the execution of each instruction, the control mechanism determines whether the instruction being executed (referred to hereafter as the xe2x80x9cpending instructionxe2x80x9d) requires data produced by the execution of another instruction. If so, the control mechanism then determines whether the other instruction has been executed, at least to the point where the needed data is available. If this data is not yet available, the control mechanism stalls (i.e., temporarily stops) execution of the pending instruction until the necessary data becomes available.
Stalling of the pending instruction is usually accomplished by asserting a stall signal transmitted to the pipeline executing the pending instruction. In response to the stall signal, the pipeline is designed to stop execution of the pending instruction until the stall signal is deasserted by the control mechanism. Once the necessary data becomes available, the control mechanism deasserts the stall signal, and in response, the pipeline resumes execution of the pending instruction.
To further reduce the amount of time required to process instructions, parallel processing, sometimes known as superscalar processing, was developed. In parallel processing, a plurality of pipelines are defined that simultaneously execute instructions. One type of parallel processing is out-of-order processing. In out-of-order processing, each pipeline of a processor simultaneously executes different instructions independently of the other pipeline(s).
It typically takes different amounts of time for different instructions to execute, and it is, therefore, possible for an instruction of one pipeline to be fully executed before another instruction of another pipeline, even though the other instruction was input into its respective pipeline first. Accordingly, instructions are not necessarily executed in the same order that they were input into the pipelines, and as a result, the control mechanism required to avoid errors from data dependency hazards is relatively complex in out-of-order processors. Furthermore, as the number of pipelines is increased, the complexity of the control mechanism increases dramatically. Consequently, many conventional parallel processors, particularly processors having a large number of pipelines, employ an in-order type of processing in lieu of the out-of-order type of processing described above.
During in-order processing, the instructions being processed by the different pipelines are stepped through the stages of the pipelines on certain edges of a system clock signal. In this regard, the processing of instructions in a pipeline is usually divided into stages, and each stage of the pipeline simultaneously processes a different instruction.
As an example, the processing performed by each pipeline may be divided into a register stage, an execution stage, a detect exceptions stage, and a write stage. During the register stage, any operands necessary for the execution of an instruction are obtained. Once the operands have been obtained, the processing of the instruction enters into the execution stage in which the instruction is executed. After the instruction has been executed, the processing of the instruction enters into a detect exceptions stage in which conditions, such as overruns during execution, for example, that may indicate data unreliability are checked. After the detect exceptions stage is completed, a write stage is entered in which the results of the execution stage are written to a register or a location in memory.
A key feature of in-order processing is that each instruction of an issue group steps through each stage at the same time. An xe2x80x9cissue group,xe2x80x9d as defined herein, is a set of instructions simultaneously (i.e., during the same clock cycle) processed by the same stage of different pipelines within a single processor. As an example, assume that each stage of each pipeline processes one instruction at a time, as is typically done in the art. The instructions in the detect exceptions stage of the pipelines form a first issue group, and the instructions in the execution stage of the pipelines form a second issue group. Furthermore, the instructions in the register stage of the pipelines form a third issue group. Each of the issue groups advances into the next respective stage in response to an active edge of the system clock signal. In other words, the first issue group steps into the write stage, the second issue group steps into the detect exceptions stage, and the third issue group steps into the execution stage in response to an active edge of the system clock signal.
As used herein, an xe2x80x9cactive edgexe2x80x9d is any edge of the system clock signal, the occurrence of which induces each unstalled instruction in a pipeline to advance to the next stage of processing in the pipeline. For example, assume that a processor is designed to step each unstalled instruction into the next stage of processing every three clock cycles. In this example, the active edges could be defined as every third rising edge of the clock signal. It should be noted that which edges of the clock signal are designated as xe2x80x9cactive edgesxe2x80x9d is based on design parameters and may vary from processor to processor.
During in-order processing, each instruction in one issue group is prevented from passing another instruction in another issue group. In other words, instructions of one issue group input into the pipelines after the instructions of another issue group are prevented from entering into the same stage processing the instructions of the other issue group. Therefore, at any point in time, each stage of the pipelines is respectively processing instructions from only one issue group. Since instructions from different issue groups are prevented from passing each other, the control mechanism for controlling the pipelines and for preventing errors from data dependency hazards is greatly simplified relative to out-of-order processing.
However, the reduction in the complexity of the control mechanism comes at a cost. In this regard, in-order processing prevents some instructions from traversing through their pipelines at the fastest possible rate. In this regard, to ensure that an instruction of one issue group does not pass an instruction of another issue group, an instruction is not allowed to proceed to the next stage until all of the instructions in the issue group are ready to proceed to the next stage. In other words, if one instruction of an issue group is stalled, all of the instructions of the issue group are stalled, even if some of the instructions of the issue group have sufficient data available to complete the current stage and to proceed to the next stage.
Thus, a heretofore unaddressed need exists in the industry for providing a system and method of increasing the efficiency of parallel processors that employ in-order processing.
The present invention overcomes the inadequacies and deficiencies of the prior art as discussed hereinbefore. Generally, the present invention provides a system and method for efficiently processing instructions from a computer program by enabling processing of instructions within an issue group to execute while independently stalling other instructions in the issue group.
In architecture, the processing system of the present invention utilizes a plurality of pipelines, an instruction dispersal unit, and a control mechanism. The instruction dispersal unit receives instructions of a computer program and defines issue groups based on the received instructions. Each of the issue groups is sequentially transmitted to the pipelines and includes instructions that may be simultaneously processed by the pipelines.
The control mechanism analyzes the instructions in the issue groups as the instructions are being processed by the pipelines. The control mechanism determines whether any instructions in one of the issue groups should be stalled. The control mechanism then asserts stall signals across connections respectively coupled to each of the pipelines processing an instruction within the one issue group that should be stalled, and the control mechanism deasserts stall signals across connections respectively coupled to each of the pipelines processing the other instructions within the one issue group. Each of the pipelines receiving one of the asserted stall signals stalls an instruction in the one issue group, and each of the pipelines receiving one of the deasserted stall signals allows processing of an instruction in the one issue group to continue.
The present invention can also be viewed as providing a method for efficiently processing instructions of computer programs. The method can be broadly conceptualized by the following steps: receiving instructions from a computer program; defining issue groups based on the instructions, each of the issue groups including instructions that may be simultaneously processed; sequentially transmitting the issue groups to a plurality of pipelines; simultaneously processing each instruction in one of the issue groups; stalling an instruction in the one issue group; and enabling processing of other instructions in the one issue group while the instruction is stalled in the stalling step.
Other features and advantages of the present invention will become apparent to one skilled in the art upon examination of the following detailed description, when read in conjunction with the accompanying drawings. It is intended that all such features and advantages be included herein within the scope of the present invention and protected by the claims.