The present invention relates to an instruction control system, and more particularly to a data processing apparatus which can shorten the branch instruction executing time of a microprocessor employing a pipeline control system.
As shown in FIG. 1, a data processing apparatus 1 comprises, in general, an interface circuit 4 which exchanges data with a main storage 5, an instruction control unit 2 which controls an instruction to be executed, and an instruction execute unit 3 which executes the instruction. When an instruction fetched from the main storage 5 is transmitted to the instruction control unit 2 via the interface circuit 4, the instruction control unit 2 decodes the transmitted instruction and transmits the decoded result to the instruction execute unit 3. According to the decoding, the instruction execute unit 3 generates various control signals, which enable or disable gates within the instruction execute unit 3 so as to perform the processes of operations, storage, shift, etc. The instruction control unit 2 commands the main storage 5 through the interface circuit 4 to fetch the next instruction. By repeating such series of operations, the data processing apparatus 1 runs a program stored in the main storage 5.
In this data processing apparatus, the interface circuit 4, instruction control unit 2 and instruction execute unit 3 can operate in parallel with one another, and the pipeline control is made.
Referring to FIG. 2, an instruction .circle.1 is fetched from the main storage 5 by the interface circuit 4 in two cycles T.sub.1 and T.sub.2, and it is decoded by the instruction control unit 2 in a cycle T.sub.3. The next instruction .circle.2 is fetched from the main storage 5 by the interface circuit 4 in two cycles T.sub.3 and T.sub.4.
Meanwhile, the instruction .circle.1 decoded in the cycle T.sub.3 executed by the instruction execute unit 3 in two cycles T.sub.4 and T.sub.5. Since the fetch of an instruction .circle.3 is started and the instruction .circle.2 is decoded in the cycle T.sub.5, the instruction execute unit 3 executes the instruction .circle.2 in cycles T.sub.6 and T.sub.7 without any idle time. In this manner, the pipeline control makes it possible to shorten the execution time per instruction.
However, in a case where a branch instruction has been executed midway of the series of processes, a target address of branch is determined for the first time after the execution of the instruction. As shown in FIG. 3, therefore, an idle time of 4 cycles arises after the execution of a branch instruction .circle.1 until a target instruction of branch .circle.10 is executed.
In this manner, with the prior-art data processing apparatus of the pipeline control type, the execution of the branch instruction renders the parallel processing impossible during the corresponding period, and the effect of the pipeline control is not demonstrated, so that degradation in performance is incurred.
In order to decrease the drawback, there has heretofore been often employed a method wherein a copy of some of the instructions stored in the main storage 5 is held in a cache memory of high speed and small capacity disposed in the data processing apparatus 1, and in the presence of a target instruction of branch in the cache memory, the instruction is fetched therefrom, whereby the period of time for fetch from the main storage 5 is shortened.
Also, there is often employed a method wherein an address computing circuit is disposed in the instruction control unit 2, and before the completion of the execution of a branch instruction in the instruction execute unit 3, a target address of branch is computed and is used for fetching a target instruction of branch, thereby to shorten the branch instruction processing time. With this method, as shown in FIG. 4, after a branch instruction .circle.1 has been decoded, a target address of branch is computed in parallel with the execution of the instruction .circle.1 by the address computing circuit. Therefore, a target instruction of branch .circle.10 can be fetched in two cycles T.sub.5 and T.sub.6. In FIG. 4, the delay of processing in the case of the branch decreases to two cycles.
Since, however, this method is premised on the execution of branch, a needless instruction is fetched as shown in FIG. 5 when the condition of a conditional branch instruction is not met. Moreover, in order to perform the address computation within the instruction control unit 2, information necessary therefor must be gained from the instruction execute unit 3. This leads to a complicated arrangement, and also to an increase in the number of wiring lines, etc. which lower the density of integration when the circuits are packaged in an LSI. Accordingly, the method is unsuitable for a microprocessor etc. requiring the LSI implementation.
With a general-purpose computer or the like, a large number of signal lines can be laid between the main storage 5 and the interface circuit 4, and the quantity of data which can be simultaneously transmitted is large, so that the fetching of one instruction is completed in one cycle as shown in FIG. 6. Therefore, even in the case where the condition of the conditional branch instruction is not met, an instruction required in this case is already fetched at a high possibility at a point of time before the fetching of the target instruction of branch is started, and the needless fetching seldom degrades the performance. However, with the microprocessor or the like in which the data transmission throughput cannot be increased on account of limitation to the number of input/output pins, one instruction cannot be fetched in one cycle, and hence, the performance is degraded.
Heretofore, in order to fetch target instructions of a branch instruction at high speed, an apparatus has been proposed wherein the target instructions of a branch instruction are stored in an associative memory beforehand, and when an instruction is decoded, whether or not it is a branch instruction to hold is predictively checked using an instruction address as an access input, and the corresponding target instruction is output (refer to Japanese Patent Application Publication No. 54-9456 corresponding to U.S. Pat. No. 3,940,741). The apparatus, however, shortens only the fetch time of the target instruction of a branch instruction and has the disadvantage that the subsequent instruction decoding time is not shortened.