The present invention relates to a data processor capable of making zero the execution clock count for a branch instruction, a compare instruction, or a compare instruction with its following branch instruction when a series of instructions executed once is again executed during pipeline processing.
A conventional data processor which executes a branch instruction at high speed is known as shown having the arrangement of FIG. 13. In FIG. 13, reference numeral 11 designates an instruction pre-fetch unit for pre-fetching an instruction code; 12 an instruction decoding unit for decoding an instruction code pre-fetched by the instruction pre-fetch unit 11; 13 an instruction execution unit for executing an instruction in accordance with the control information obtained from the instruction decoding unit 12; 14 a history storage unit for storing a branch target instruction code and its address when a branch instruction is executed; 15 a branch prediction unit for obtaining the branch target instruction code and its address from the history storage unit 14 and outputting them to the instruction decoding unit 12 and instruction pre-fetch unit 11, respectively, prior to the execution of the branch instruction, 16 a branch target comparison unit for comparing the branch target address outputted from the branch prediction unit 15 with the address of an instruction to be executed next the branch instruction; 17 an input-output bus for the connection to a memory, an I/O device and the like; and 18 a bus control unit for controlling the input-output bus 17 under arbitration among data input-output requests from the instruction pre-fetch unit 11 and instruction execution unit 13. FIG. 14 shows the structure of the history storage unit 14 shown in FIG. 13, wherein reference numeral 141 designates an address tag field for storing the address of an instruction executed immediately before a branch instruction; 142 a branch target instruction field for storing an instruction code of a predetermined word length located at the branch target address of a branch instruction; 143 a branch target address field for storing the branch target address of a branch instruction; and 144 a valid bit indicating that the data in the corresponding entry are valid.
In the conventional data processor constructed as above, a branch instruction, if it is executed for the first time is decoded by the instruction decoding unit 12 and thereafter executed at the instruction execution unit 13. If the branch is taken, the branch target address calculated at the instruction execution unit 13 is given to the instruction pre-fetch unit 11 to read a new branch target instruction code. In this case, the history storage unit 14 stores in the address tag field 141 the address of an instruction which was executed immediately before the branch instruction and is held in the instruction execution unit 13, in the branch target address field 143 the branch target address calculated by the instruction execution unit 13, and in the branch target instruction field 142 the read-out branch target instruction code, respectively in the same entry. The valid bit 144 is set to "1" indicating that the data in the entry are valid. Assume that the same branch instruction is thereafter executed again. The instruction decoding unit 12 decodes an instruction immediately before this branch instruction, and at the same time the branch prediction unit 15 searches the history storage unit 14 by using the address of the instruction under decoding. In this case, the address hits the address tag field 141, and the branch prediction unit 15 predicts a taken branch without waiting for the execution of the branch instruction, and obtains the branch target instruction code and its address respectively from the branch target instruction field 142 and branch target address field 143 to send them to the instruction decoding unit 12 and instruction pre-fetch unit 11, respectively. The instruction decoding unit 12 starts decoding the branch target instruction code received from the branch prediction unit 15 immediately after the end of decoding the branch instruction. The instruction pre-fetch unit 11 cancels the pre-fetched instruction and pre-fetches an instruction at the address advanced from the branch target address received from the branch prediction unit 15 by the amount corresponding to the word length of the branch target instruction field 142. After the branch instruction is executed at the instruction execution unit 13 to determine the address of an instruction which is to be executed next the branch instruction, the branch target comparison unit 16 compares the determined address with the branch target address output from the branch prediction unit 15, to thereby verify the branch prediction as true or false. If the comparison result shows a difference therebetween, the valid bit 144 is reset to "0". The instruction decoding unit 12 cancels the instruction under decoding, and the instruction pre-fetch unit 11 cancels the pre-fetched instruction and pre-fetches a new instruction in accordance with the address of the instruction to be executed next the branch instruction, the address having been determined by the instruction execution unit 13. If a branch instruction was taken in the past, there is a high possibility that the same branch instruction is taken when it is executed again. In such a case, the instruction execution 13 can execute the branch target instruction after the execution of the branch instruction, without any wait time.
FIG. 15 is a timing chart illustrating the operation of the data processor described above. Instructions processed in the instruction pre-fetch unit 11, branch prediction unit 15, instruction decoding unit 12, instruction execution unit 13 and branch target comparison unit 16 are illustrated in units of clock. It is assumed herein that the clock count necessary for each unit is one clock. When the branch prediction unit 15 searches at clock t2 the history storage unit 14 in accordance with the address of an instruction (instruction b) immediately before a branch instruction, the address hits the address tag field 141 so that the branch prediction unit 15 sends the branch target instruction code and its address respectively to the instruction decoding unit 12 and instruction pre-fetch unit 11. In this case, however, the instruction decoding unit 12 does not decode it immediately at that time. The presence of the branch instruction (instruction c) is detected at clock t3 upon decoding of the branch instruction by the instruction decoding unit 12. The instruction decoding unit 12 then starts decoding a branch target instruction code (instruction m), and the instruction pre-fetch unit 11 pre-fetches an instruction (instruction n) in accordance with the address advanced from the branch target address by the amount corresponding to the word length of the branch target instruction field 142. The pre-fetch timings other than the instruction n are not shown in FIG. 15. The instruction execution unit 13 executes the branch instruction (instruction c) at clock t4 to determine an actual branch target address. It is compared, at clock t5 by the branch target comparison unit 16, with the branch target address output from the branch prediction unit 15, to thereby verify the branch prediction as true or false. The vertification results show that the prediction was true so that the data processing further continues. The branch target instruction (instruction m) is executed at clock t5.
With the conventional data processor, it is necessary, however, that the instruction decoding unit 12 first decodes the branch instruction and then the instruction execution unit 13 determines the address of an instruction to be executed next the branch instruction. The execution clock count for the branch instruction does not become zero even if the prediction by the branch prediction unit 15 is correct. Further, if the branch instruction is an unconditional branch instruction, the branch is necessarily taken and the verification of the prediction is not needed. However, since the same arrangement is applied also to the unconditional branch instruction, the execution clock count for the unconditional branch instruction does not become zero. Furthermore, with the conventional data processor, a compare instruction is executed by the instruction execution unit 13 so that the execution clock count for a compare instruction, which is present in advance of a branch instruction at high possibility, does not become zero even if the prediction by the branch prediction unit 15 is correct, but takes a predetermined number larger than or equal to one.