With development of electronic technologies, it's becoming one of the hottest issues in computer architecture research as to how to further increase data processing speed of a processor.
The average number of IPC (Instructions Committed per Cycle) is an important measure for assessing data processing speed of a processor. An ordinary pipelined scalar processor can reach a maximum processing speed of one instruction per cycle (i.e. IPC=1). While in most cases, IPC of a pipelined scalar processor is less than 1.
A pipelined superscalar processor is now applied to improve the processor performance. providing in the pipelined superscalar processor multiple instruction-fetching units, multiple instruction-decoding units, multiple functional units for performing the corresponding algorithms and multiple result-writing units, in a cycle, by using dynamic schedule, multiple instructions can be moved from one pipeline stage to the next one, so the pipelined superscalar processor can execute multiple instructions per cycle (IPC>1).
FIG. 1 schematically illustrates an instruction n in the instruction-execution stage of seven stages representing a pipeline. The seven stages are IF (Instruction Fetch) stage, ID (Instruction Decode & Rename) stage, IS (Instruction Issue) stage, RR (Read Register) stage, EX (Instruction Execution) stage, WB (Write Back) stage and RET (Instruction Retirement) stage, respectively.
A brief description will be given below to the operating principle of the superscalar processor, by taking the seven-stage pipeline as an example.
Firstly, multiple instruction-fetching units fetch multiple instructions simultaneously from the instruction cache at IF stage; secondly, the corresponding instruction-decoding units decode and rename the multiple decoding instructions respectively at ID stage, and place the multiple decoded instructions into an issue queue; then, a determination procedure (or issue logic determination procedure) is performed to the multiple decoded instructions at IS stage, to determine whether each decoded instruction in the issue queue can be issued to enter RR stage.
In order to improve the IPC performance of pipelined superscalar processors, the decoded instructions in the issue queue should be issued simultaneously as many as possible, so as to execute more instructions at subsequent EX stage.
The factors affecting whether a decoded instruction can be issued mainly involve whether the source operands required in executing the decoded instruction are ready and whether the functional units (e.g., float adding units and multiplying/dividing units) are available. Corresponding to the two factors, the issue logic determination procedure to be executed at the IS stage is composed of wakeup logic determination procedure and select logic determination procedure.
1. Wakeup Logic Determination Procedure
Usually, a set of a group of instructions to be detected for simultaneous execution in the issue queue is called as a window (or issue window). On the pipelined superscalar processor, the source operands of the instructions to be detected in the issue window are determined by the target operands generated from the executed instructions on the pipeline. Accordingly, when the results of the instructions executed (i.e., the target operands of the executed instructions) at EX stage on the pipeline are broadcasted over a data bus in every cycle, each instruction to be detected in the issue window compares its own source operands with the broadcasted target operands respectively, to judge whether the broadcasted target operands are the source operands it requires. This comparison determination procedure is the wakeup logic determination procedure.
When one of the source operands required by the instruction to be detected corresponds with the broadcasted target operands, the corresponding tag, which indicating whether the source operand of the instruction to be detected is ready, is set to “acquirable”; when all source operands required by the instruction to be detected are ready, i.e. the tags of all source operands are “acquirable”, the instruction to be detected enters to-be-issued status.
During execution of the wakeup logic determination procedure, the larger the issue window is, the more is the number of the instructions to be detected, and the higher is the possibility of the instruction entering to-be-issued status, which is in favor of increasing the issue number of the decoded instructions in the issue queue. However, if the size of the issue window is increased blindly, the length of the data bus for broadcasting target operands needs to be longer accordingly, which thus results in prolonging of the latency brought by broadcasting the target operands. Furthermore, with the number of the instructions to be detected in the issue window increasing, the number of comparison operations to be executed increases too, which further prolongs the time for executing the wakeup logic determination procedure. Thus, the size of the issue window should be set properly to increase the instruction issue number.
2. Select Logic Determination Procedure
Each instruction entering to-be-issued status after the wakeup logic determination procedure needs the select logic determination procedure before entering the next RR stage. The reasons are: (1) it's likely that the number of instructions entering to-be-issued status is more than the number of the functional units in the processor; and (2) some instructions can only be executed by a subset of the functional units, for example, if there is only one multiplier in the processor, all multiplication operations have to be executed by this multiplier. Consequently, if instructions are issued selectively from the instructions in to-be-issued status through the select logic determination procedure, resource conflict at subsequent EX stage can be avoided effectively.
Further, only after an instruction already entering to-be-issued status is issued after the select logic determination procedure, can a subsequent instruction dependent on the instruction (for example, the source operands required for execution of the subsequent instruction are from the target operands generated by executing the issued instruction) enable its wakeup logic determination procedure. On the pipelined superscalar processor, the scheme known as “Critical Loops” composed of wakeup logic determination procedure and select logic determination procedure can avoid data conflicts between dependent instructions effectively.
As to the instructions issued simultaneously at the IS stage through the above wakeup logic determination procedure and select logic determination procedure, they read their respective source operands from the corresponding physical registers in the register file at RR stage, and then execute the corresponding functional operation at the following EX stage in the functional units selected during the select logic determination procedure, wherein different operations may need different cycles. For example, the cycle required for executing an integer adding operation is usually shorter than that for a float multiplication operation. Therefore, instructions moving simultaneously from RR stage to EX stage may experience different cycles to get the result of instruction operation generated at EX stage.
After obtaining the operation result at its respective EX stage, each executed instruction stores the operation result (i.e. target operands) into the corresponding physical registers in the register file at WB stage respectively, and broadcasts the target operands over the above data bus, for subsequent instructions at IS stage in the issue window on the pipeline to execute wakeup logic determination procedure.
After WB stage, an executed instruction will finally terminate all its operation procedures on the pipeline at RET stage.
It can easily be seen from the above description that it's obviously an effective way to improve IPC by increasing the number of the decoded instructions to be detected in the issue window. But as stated above, with the number of the decoded instructions increasing, the time for executing wakeup logic determination procedure also increase remarkably, which undoubtedly will lead to deterioration in the IPC performance.
It is, therefore, necessary to put forward a novel method for issuing instruction to solve this contradiction. And this is the aim of the present invention.