FIG.1 is an illustration of a conventional execution unit 100 of the CPU (central processing unit) of a general purpose computer. The execution unit 100 includes a pipeline 102 to execute certain instructions of a computer program. The pipeline 102 has successive pipeline stages S1 to S9 for executing each instruction in the pipeline 102. The pipeline stages S1 to S9 include an operand selection stage S1, an operand processing (i.e., execute) stage S2, other pipeline stages S3 to S6, a validity determination stage S7, another pipeline stage S8, and an operand write stage S9. Each of the pipeline stages S1 and S3 to S9 occurs in one machine cycle and the operand processing stage S2 occurs in a variable number of machine cycles, as will be described later.
Each instruction in the pipeline 102 is first issued by the CPU to the dispatch controller 104 of the exception unit 100. The dispatch controller 104 dispatches the issued instruction to the pipeline 102 during the operand selection stage S1. The dispatch controller 104 also pre-decodes the instruction and in response generates control signals during the pipeline stages S1 to S9 for the instruction to control the operation of the ARF 106 and the pipeline 102 in the manner described hereafter.
The operand selection stage S1 of the pipeline 102 includes MUXs 128. During the operand selection stage S1 for each instruction in the pipeline 102, the MUXs 128 select one or more source operands S1 SSOP1 and/or S1 SSOP2 for processing by the operand processing stage S2 of the pipeline 102. As described next, this selection is made from among the source operands S1 SOP1 and S1 SOP2 received from the ARF 106, the local destination operands S2 LDOP to S9 LDOP received respectively from the operand bypasses 114 to 121, the external destination operands S2 XDOP to S9 XDOP received respectively from the operand bypasses 121 to 127, and an immediate source operand IMMD SOP received from the control logic 110 of the pipeline 102.
The ARF 106 comprises the architectural registers of the computer. During the operand selection stage S1 for each instruction in the pipeline 102, the ARF 106 selectively provides source operands S1 SOP1 and S1 SOP2 from selected architectural registers of the ARF 106 to the operand selection stage S1 of the pipeline 102. The source operand S1 SOP1 or S1 SOP2 provided by the ARF 106 will be selected by one of the MUXs 128 if the dispatch controller 104 determines that the source operand S1 SOP1 or S1 SOP2 is currently available in one of the architectural registers of the ARF 106. This architectural register is specified by the instruction as a source.
However, for each instruction in the pipeline 102, the dispatch controller 104 may determine that the instruction requires an immediate source operand IMMD SOP from the control logic 110 instead of a source operand S1 SOP1 or S1 SOP2. In this case, one of the MUXs 128 selects the immediate source operand IMMD SOP.
The dispatch controller 104 may also determine during the operand selection stage S1 for each instruction in the pipeline 102 that the source operand S1 SOP1 or S1 SOP2 is not yet available in an architectural register of the ARF 106 but is in flight and available elsewhere. In this case, it may be available as one of the local destination (or result) operands S2 LDOP to S8 LDOP or one of the external destination operands S2 XDOP to S8 XDOP and then selected by one of the MUXs 128. The local destination operands S2 LDOP to S8 LDOP are generated by the pipeline 102 respectively during the pipeline stages S2 to S8 for other instructions in the pipeline 102. The external destination operands S2 XDOP to S8 XDOP are respectively generated during the pipeline stages S2 to S8 for instructions in another pipeline (designated by X, but not shown) of the execution unit 100. This is done by respective external operand bypass sources of this pipeline.
In the operand processing stage S2 for each instruction in the pipeline 102, the one or more selected source operands S1 SSOP1 and/or S1 SSOP2 are first latched by the registers 134 of the operand processing stage S2 as the one or more selected source operands S2 SSOP1 and/or S2 SSOP2. Furthermore, in the operand processing stage S2 for the instruction, the control logic 110 of the pipeline 102 generates control signals that cause the arithmetic logic 132 of the operand processing stage S2 to process the one or more selected source operands S2 SSOP1 and/or S2 SSOP2 and generate in response a destination operand S2 LDOP for the instruction. These control signals are generated in response to decoding the instruction.
The pipeline stages S3 to S8 respectively include registers 138 to 143. Thus, in the pipeline stage S3 for each instruction in the pipeline 102, the register 138 latches the local destination operand S2 LDOP generated in the operand processing stage S2 for the instruction as the local destination operand S3 LDOP. Similarly, in the pipeline stages S4 to S8 for each instruction in the pipeline, the registers 139 to 143 respectively latch the local destination operands S3 LDOP to S7 LDOP that were respectively latched in the previous pipeline stages S3 to S7 as respectively the destination operands S4 LDOP to S8 LDOP. Thus, the destination operands S3 LDOP to S8 LDOP are all delayed versions of the destination operand S2 LDOP.
The pipeline stages S3 to S6 and S8 are needed since other processing is occurring in the execution unit 226. Moreover, the dispatch controller 104 makes the determination of whether an instruction is valid or invalid in the validity determination stage S7.
For each instruction in the pipeline 102 that is determined to be valid by the dispatch controller 104, the architectural register in the ARF 106 that is specified by the instruction as the destination stores the destination operand S8 LDOP during the operand write stage S9 for the instruction. Thus, the destination operand S8 LDOP for this particular instruction will now be available in the ARF 106 as a source operand S1 SOP1 or S1 SOP2 in the operand selection stage S1 for a later instruction in the pipeline 102 or another pipeline of the execution unit 100.
However, an instruction in the pipeline 102 may be invalid due to a branch mispredict, a trap, or an instruction recirculate. A branch mispredict will be indicated by a BMP (branch mispredict) signal received by the dispatch controller 104 from another pipeline of the execution unit 100. A trap may be detected locally by the dispatch controller 104 or from TRP (trap) signals received by the dispatch controller 104 from other pipelines in the execution unit. Moreover, an instruction recirculate will be indicated by RCL (instruction recirculate) signals received by the dispatch controller 104 from the data cache (not shown) of the CPU when a data cache miss has occurred.
If the dispatch controller 104 determines that an instruction in the pipeline 102 is invalid, then the ARF 106 does not store the destination operand S8 LDOP for the instruction. In this way, the ARF 106 cannot be corrupted since the destination operand S8 LDOP for the instruction will not be stored in the ARF 106 until the dispatch controller 104 has determined that the instruction is valid.
However, later instructions in the pipeline 102 may depend on the local destination operands S2 LDOP to S8 LDOP of earlier instructions in the pipeline 102 and/or external destination operands S2 XDOP to S8 XDOP of earlier instructions in another pipeline which are in flight and have not yet been stored in the ARF 106. Similarly, later instructions in the other pipeline may depend on the local destination operands S2 LDOP to S8 LDOP of earlier instructions in the pipeline 102 which are in flight and have not yet been stored in the ARF 106. Thus, these local and external destination operands S2 LDOP to S8 LDOP to S2 XDOP to SB XDOP must be made available with minimum latency to preserve the performance of the CPU. In order to do this, the execution unit 100 includes the operand bypasses 114 to 120 from the pipeline 102 and the operand bypasses 121 to 127 from the other pipeline in the execution unit 100.
More specifically, the arithmetic logic 132 is coupled to the MUXs 128 by the operand bypass 114 for the operand processing stage S2. Similarly, the registers 138 to 143 are respectively coupled by the operand bypasses 115 to 121 for the intermediate stages S3 to S8 to the MUXs 128. In this way, the arithmetic logic 132 and the registers 138 to 143 are local operand bypass sources of respectively the local destination operands S2 LDOP to S8 LDOP. And, as alluded to earlier, the external operand bypass sources in the other pipeline of the execution unit 100 are coupled to the MUXs 128 by the operand bypasses 121 to 127 for the pipeline stages S2 to S8 to provide the external destination operands S2 LDOP to S8 LDOP.
Thus, in the operand selection stage S1 for each instruction in the pipeline 102, this particular instruction may specify as a source the same selected register in the ARF 106 that an earlier instruction in the pipeline 102 or another pipeline in the execution unit 100 specifies as a destination. This earlier instruction may be in the pipeline stage S2, . . . , S7, or S8 of the pipeline 102 or the other pipeline. In this case, the local or external destination operand S8 LDOP or S8 XDOP generated for the earlier instruction will not yet be available from the selected register but will be available as the local or external destination operand S2 LDOP, . . . , S6 XDOP, or S7 XDOP on the corresponding operand bypass 114, . . . , 126, or 127. As a result, the MUXs 128 will select this local or external destination operand S2 LDOP, . . . , S6 XDOP, or S7 XDOP for processing by the arithmetic logic 132.
FIG. 2 illustrates this more precisely for the pipeline 102. As shown, the initial instruction ADD in the pipeline 102 obtains its source operands S1 SOP1 and S1 SOP2 from the registers r0 and r1 of the ARF 106 that are specified as sources during the operand selection stage S1 for the ADD instruction. And, during the operand processing stage S2 for the instruction ADD, the destination operand S2 LDOP is generated. However, the destination operand S8 LDOP is written to the register r2 of the ARF 106 that is specified as the destination only during the operand write stage S9 for the instruction ADD. Thus, any instruction SUB, . . . , or AND that has its operand selection stage S1 during the pipeline stage S2, . . . , S7, or S8 of the instruction ADD and is dependent on the instruction ADD by specifying the register r2 as a source, must use the corresponding operand bypass 114, . . . , 119, or 120 to obtain the destination operand S2 LDOP, . . . , S2 LDOP, or S8 LDOP as the selected source operand S1 SOP1 or S1 SOP2. And, only for the instructions XNOR, etc . . . , that have their operand selection stages S1 after the pipeline stage S2 to S8 of the instruction ADD, will the selected source operand S1 SOP1 or S1 SOP2 be directly available from the register r2.
Therefore, since the ARF 106 is only written to in the operand write stage S9 for each instruction, the pipeline 102 must have operand bypasses 114 to 120 for the pipeline stages S2 to S8 in the pipeline 102 and must also be coupled to the operand bypasses 121 to 127 from the other pipeline. Unfortunately, these numerous operand bypasses 114 to 127 occupy much space and introduce complex and intractable timing and routing problems in the CPU.
In view of the foregoing, it would be desirable to reduce the number of operand bypass to and from pipelines in an execution unit to reduce the complexity of the pipelines. Furthermore, it would be desirable to do so without increasing the latency in which local and external destination operands of earlier instructions are made available for selection as source operands for later instructions.
Referring back to FIG. 1, in many CPUs, the arithmetic logic 132 is configured to process (i.e., perform arithmetic computations on) the one or more selected source operands S1 SSOP1 and/or S1 SSOP2 for all instructions of a predefined arithmetic instruction type. These may include performance critical arithmetic instructions which are critical to the performance of the CPU since they are commonly used. For each of the performance critical arithmetic instructions, the operand processing stage S2 occurs in one machine cycle. The instructions of the predefined arithmetic instruction type may also include non-performance critical arithmetic instructions which are not as frequently used and therefore not as critical to the performance of the CPU. For each of these non-performance critical arithmetic instructions, the operand processing stage S2 has substages and occurs in multiple machine cycles with the number of machine cycles varying depending on the instruction.
The temptation to configure the arithmetic logic 132 to perform processing operations for both performance critical and non-performance critical arithmetic instructions of a certain arithmetic instruction type stems from the fact that many of the performance critical arithmetic instructions are similar to the non-performance critical arithmetic instructions. Although configuring the arithmetic logic 132 to perform processing operations for both performance critical and non-performance critical arithmetic instructions results in potential savings in area and power consumption, the complicated design of the CPU can slow down its performance with respect to the performance critical instructions.
Thus, it would be desirable to have a CPU with a performance critical pipeline that processes only the performance critical arithmetic instructions and a separate non-performance critical pipeline that processes only the non-performance critical arithmetic instructions. Moreover, it would be further desirable to locate at least the arithmetic logic of the non-performance critical pipeline away from the core of the execution unit. This enables the dispatch controller, the performance critical pipeline, and the ARF of the core of the execution unit to operate over shorter distances with less complexity so that the performance of the performance critical pipeline is maximized.