1. Field of the Invention
The present invention relates to a technique of disposing a bypass path to obtain a content of a source register content used to instruction execution at a high speed during execution of an instruction on an instruction bus, particularly to a bypass control circuit for use inside a processor.
2. Related Background Art
In a recent processor, in order to enhance a processing efficiency, an instruction is subdivided into a plurality of stages and executed in parallel, that is, a so-called pipeline processing is performed in many cases. FIG. 1 is a flowchart showing an outline of the pipeline processing.
First, the instruction to be executed is fetched from an instruction cache in which instructions are stored (step S1). Subsequently, the instruction is decoded, and a source operand is read from a source register (step S2).
Here, the instruction executed by the processor is, as shown in FIG. 2, constituted of an operation code Op indicating an instruction type, a destination operand Rd as a storage destination of an instruction execution result, and source operands Rs, Rt for use in executing the instruction.
In the following, a register storing the destination operand is called a destination register, and a register storing the source operand is called a source register. The destination register or the source register is stored in a register file 33 in the processor.
After the source register is read from the register file 33 in the step S2, the decoded instruction is executed (step S3). Subsequently, an operation result is written back to the destination register (step S4).
Since cycle number required for instruction execution differs in accordance with the instruction type, in the step S4, time adjustment is performed by transferring the instruction execution result by a plurality of flip-flops.
In the step S2, the content of the corresponding source register is read from the register file. When a destination register number of the preceding instruction is the same as a source register number, the operation of the preceding instruction ends, and the result has already been obtained but has not been written to the register file yet, that is, at a time when writing has not been finished for time adjustment, the content of the destination register is bypassed to the source register and the instruction execution is performed.
FIG. 3 is a schematic block diagram of a conventional bypass control circuit for controlling such bypass. The bypass control circuit of FIG. 3 shows an example in which the instruction outputted from an instruction cache is executed through the subdivided four stages A to D, and the final execution result is written back to the destination register in the register file 33 shown in FIG. 4.
Moreover, the stage from which the final result is obtained differs by the instruction type. With simple instructions such as addition and subtraction, the operation result is obtained at the end of A stage. For a complicated shift instruction, the operation result is determined at the end of B stage, and a result of a load store instruction is obtained at the end of C stage. For instructions requiring long calculation time, such as multiplication instruction of 32 bits, the result cannot be obtained until the end of D stage. In this manner, the stage from which the final result is obtained differs with the instruction, but timing of returning data to the register file is set to be the same. Therefore, the final operation result is obtained with respect to the instruction whose result is obtained in a particularly short time, but a time zone in which writing is not performed yet is generated in the register file. When the subsequent instruction refers to the final operation result in this time zone, the data is transferred by a bypass.
In the bypass control circuit of FIG. 3, each of the A to D stages is provided with flip-flops 41a to 41d and comparators 42 to 44. Each of the flip-flops 41a to 41d successively transfers the register number of the destination register Rd outputted from an instruction cache 11 in synchronization with a system clock of the processor.
The comparator 42 compares an output of the flip-flop 41a of the A stage with an output of the register number of the source register outputted from the instruction cache 11, and outputs a comparison result. The comparator 43 compares an output of the flip-flop 41b of the B stage with the output of the register number of the source register outputted from the instruction cache 11, and outputs the comparison result. The comparator 44 compares an output of the flip-flop 41c of the C stage with an output of the register number of the source register outputted from the instruction cache 11, and outputs the comparison result.
By inputting the comparison results of the comparators 42 to 44 to inverters IV1 to IV6 and AND gates G1 to G3 and performing a logical operation, the final bypass path is determined.
Moreover, when the plurality of comparators 42 to 44 detect match, prioritizing is performed, and the output of the flip-flop corresponding to the stage close to the instruction cache 11 is preferentially utilized as the source operand of the instruction to be executed next.
This corresponds to a case in which the destination registers of a plurality of preceding instructions are the same. In this case, the operation result of the latest instruction has to be utilized as the source operand.
In a processor employing a super scaler or a processor having many pipeline states, since the number of flip-flops as a bypass object is large, a scale of a gate circuit for performing the prioritizing is enlarged. Specifically, since the number of gate stages increases, much time is required for instruction execution processing.
In an ordinary processor, since it takes relatively much time to fetch the instruction from the instruction cache, a dashed line path of FIG. 3, that is, a path for performing comparison of the register number from the instruction bus and performing the prioritizing easily becomes a critical path on timing. Moreover, by the presence of such critical path, there is a possibility that a processor operation frequency is limited.
The present invention has been developed in consideration of this respect, and an object thereof is to provide a bypass control circuit in which data can be set on a source register of an instruction to be executed on an instruction bus in a short time.
To attain the aforementioned object, there is provided a bypass control circuit comprising:
a plurality of flip-flops, cascade-connected on an instruction bus, for successively transferring a register number of a destination register indicating an instruction storage destination in synchronization with a system clock;
first comparison means for comparing the outputs of at least two flip-flops among the plurality of flip-flops with each other;
second comparison means for comparing the register number of the source register of the instruction to be executed on the instruction bus with respective outputs of at least part of the plurality of flip-flops; and
bypass path setting means for setting a bypass path of data inputted to the source register of the instruction to be executed on the instruction bus on the basis of the comparison results of the first and second comparison means.
According to the present invention, since the first comparison means is disposed to compare the outputs of two arbitrary flip-flops with each other among the plurality of flip-flops for successively transferring the register number of the destination register, the bypass path of the data inputted to the source register of the instruction to be executed can be set in a short time by utilizing the comparison result.
Moreover, when the first comparison means detects a plurality of equality, the bypass path is set on the basis of the output of the flip-flop on a first stage side, and it is possible to avoid a disadvantage that old data is inputted to the source register by mistake.