1. Field of the Invention
The present invention relates to a microprocessor having conditional execution instructions that is capable of controlling the execution of the instructions.
2. Description of the Prior Art
Recently, methods such as conditional executions and speculative instructions have been discussed in order to reduce branch penalty. A conventional ARM microprocessor capable of executing conditional executions uses dedicated flags such as a negative flag (N), a zero flag (Z), a carry flag (C), and an overflow flag (V) that are used for the decision of conditional execution.
Furthermore, in the past, there is a method in which one instruction is divided into a plurality of stages and the plurality of stages are executed in pipeline in order to increase the performance of a microprocessor.
FIG. 1 is a block diagram showing the configuration of a core of a conventional multimedia (MMA) microprocessor. In FIG. 1, the reference number 1900 designates a processor core of the conventional MMA microprocessor, 1901 denotes one execution unit incorporated in the MMA microprocessor, 1902 indicates another execution unit for executing instructions of the MMA microprocessor, and 1904 designates a data random access memory (a data RAM). Thus, the conventional MMA microprocessor is capable of executing two sub-instructions simultaneously included in a single instruction by using the execution units 1901 and 1902.
The execution unit 1901 comprises a multiplier 1910, an accumulator (ACC) 1911, a shifter 1912, and an arithmetic logic unit (ALU) 1913. The execution unit 1902 comprises an ALU 1914 and a load store unit 1915. The reference numbers 1920 and 1921 denote source data buses used for operation units such as the multiplier 1910, the accumulator 1911, the shifter 1912 and the ALU 1913 included in the execution unit 1901, through which data items to be used for the operation are read from the general purpose register file 1903 and then the data items are transferred. The reference numbers 1930 and 1931 designates source buses used for the arithmetic logic unit (ALU) 1914 and the load store unit 1915 included in the execution unit 1902, through which data items to be used for operation are read from the general purpose register file 1903.
The reference number 1925 designates a write-back bus through which the operation results of the multiplier 1910 and the ALU 1913 and so on incorporated in the execution unit 1901 are written into the general purpose register file 1903. The reference numbers 1932 and 1933 denote write-back buses through which the operation result in the execution unit 1902 is written into the general purpose register file 1903. The reference numbers 1922 and 1933 designate internal buses through which the operation result of the multiplier 1910 is transferred to the accumulator 1911 in order to accumulate it without penalty. The reference number 1940 designates bi-directional buses, through which the load/store unit 1915 and the data RAM 1904 are connected to each other, used for controlling load/store operation of operand data items.
FIG. 2 is a block diagram showing a pipeline of a part of the circuit of the execution unit 1901 incorporated in the conventional MMA microprocessor shown in FIG. 1. In FIG. 2, the reference number 1903 designates the general purpose register file, 1913 denotes the ALU, and 1910a and 1910b indicate parts used for multiplication operation, namely, tree circuit of Wallace and CPA, respectively.
In the execution unit 1901, it is possible to execute the multiplication operation within two stages in the pipeline. The reference number 1921 designates a source data bus through which source data items are read from the general purpose register file 1903. The reference number 1925 denotes a write-back bus through which the operation result is written into the general purpose register file 1903. The reference numbers 1970, 1971 and 1972 denote tri-state buffers for driving data to the source data bus 1921. The reference number 1980 designates a bypass for outputting the operation result of the ALU 1913 to the source data bus 1921, and 1981 indicates a bypass for outputting the multiplication result to the source data bus 1921.
As shown in FIG. 2, the execution of one instruction in the execution unit 1901 requires six stages of a pipeline, a Fetch (F) stage, a Decode (D) stage, a data Read (R) stage, an Execution (E) stage, a Memory access (M) stage, and a Write (W) back stage. Each of these six stages is executed in the pipeline. Thus, in order to execute one instruction in the six pipeline stages, data pass registers (DR) 1950, 1951, 1952, 1953, 1954, 1955, and 1956 shown in FIG. 2 are incorporated in the conventional MMA microprocessor core 1900. The reference number 1960 designates an instruction decoder. Control signals are generated in pipeline. Control pass registers (CR) 1961, 1962, 1963, and 1964 used for control passes are incorporated in the conventional MMA microprocessor core 1900.
The output signal from the control pass register 1962 is a write enable signal used for the data pass register 1950. The output signal from the control pass register 1963 is a write enable signal used for the tri-state buffers 1971 and 1972. The output signal from the control pass register 1964 is a write enable signal used for the tri-state buffer 1970.
FIG. 3 is a timing chart of the pipeline of the instruction to be executed in the conventional MMA microprocessor shown in FIG. 1. As shown in FIG. 3, one instruction is executed in six stages of the pipeline. The white sections designate the pipelines of sub-instructions executed by the execution unit 1901 and the black sections denote the pipelines of sub-instructions executed by the execution unit 1902. Specifically, the reference number 1000 designates the pipeline of the sub-instruction executed by the execution unit 1901. The reference number 1001 denotes the pipeline of the sub-instruction executed by the execution unit 1902. These pipelines 1000 and 1001 are executed simultaneously. The reference number 1002 denotes the pipeline of the sub-instruction of a following instruction executed by the execution unit 1901 only when no data hazard occurs between this pipeline 1002 and the pipelines 1000 and 1001. The reference number 1003 indicates the pipeline of the sub-instruction of a following instruction executed by the execution unit 1902 only when no data hazard is caused between this pipeline 1003 and the pipelines 1000 and 1001.
Because the conventional MMA microprocessor has the configuration described above, it is possible to execute a following instruction within the delay of one clock without causing any confusion of the pipeline stream only when there is no hazard between the preceding instruction and the following instruction.
The reference number 1005 designates the pipeline of the sub instruction executed by the execution unit 1902. The pipeline 1005 shows the data hazard between this sub instruction and the sub-instruction of the pipeline 1000. The reference number 1004 denotes the pipeline of the sub instruction executed by the execution unit 1901. Both the pipelines 1004 and 1005 are executed simultaneously.
As described above, when a data hazard occurs between the preceding instruction and the following instruction executed by the different execution units 1901 and 1902 and there is no bypass connected directly between the execution units 1901 and 1902 in the microprocessor. The execution of the following instruction must be delayed until the preceding instruction writes the operation result into the general purpose register file 1903.
In the conventional case described above, compared with the pipelines 1002 and 1003, the pipelines 1004 and 1005 causes a three clock penalty. Thus, the conventional microprocessors such as the MMA microprocessor executing a plurality of pipeline stages have the drawback in which the execution of the pipeline is halted temporarily and frequently in order to avoid occurrence of the data hazard.
In addition, previously, in order to increase the performance of a microprocessor, namely to increase the operation frequence of the microprocessor, there is a method in which operation results are transferred through a bypass when the microprocessor executes a plurality of pipelines. In this bypass method, for example, when the operation result of one instruction is written into the general purpose register file and a following instruction then reads the data in the general purpose register file, the following instruction can receive the operation result of the previous instruction through a dedicated bypass before the completion of the write process of the previous instruction to the general purpose register file. This method achieves to increase the performance of the microprocessor.
However, when the dedicated bypass is used and the previous instruction is a conditional execution instruction and when this previous conditional execution instruction becomes inactive based on the operation result of a conditional execution decision process, a wrong data item is transferred to the following instruction through the dedicated bypass. This causes an error in the operation of the microprocessor.