Now, most computer employs von Neumann architecture which stores programs as well as data in a storage unit and reads the stored program to be executed. Generally, a von Neumann computer is implemented by a computer system consisting of a main frame computer which includes a processor such as a CPU for controlling program execution and a main memory for temporarily storing programs and data, an input-output (I/O) unit for performing input and output between the computer and users or external units, an auxiliary storage unit (secondary storage unit) for storing programs and data in an extended period of time, and other units.
In the execution of a program under the processor's control, instructions of the program stored in the main storage unit are read, and based on the contents of the instructions, instruction processing is performed in which a control signal is sent to each unit of the computer. Therefore, the processor comprises a program counter for designating a location of an instruction to be executed in the main memory and a register such as an instruction register for storing instructions within it.
FIG. 6(a) explains the control operation in the instruction processing under the processor's control. First of all, in an instruction fetch process, the program counter is referred to read an instruction to be executed next. Then, in the instruction decoding process, the type of the read instruction is identified to interpret the instruction. Next, in the instruction executing process, data reading and operation processing are performed according to the instruction. Then, in the result storing process, the result of the execution of the instruction is stored. As shown in the figure, each process is performed within a time corresponding to a machine cycle (hereinafter referred to as timing), and all the processes are performed for a period ranging from a timing CL.sub.n to a timing CL.sub.n+3.
Normally, program execution is implemented by executing a plurality of instructions sequentially. FIG. 6(b) illustrates the case that instruction A and instruction B are successively executed. In the case shown in FIG. 6(b), the program counter is referred to read instruction A in the instruction fetch process, and in the following instruction decoding process, the content of the instruction A is interpreted; in the instruction execution process, the processing according the instructed content is performed, followed by the result storing process in which the result of the execution of the instruction is stored into a designated location. In the next timing, the instruction fetch process for instruction B is performed to read the instruction B, whereby the sequential processing of the instructions is completed. If the processing of the instruction A is performed from a timing CL.sub.n to a timing CL.sub.n+3 as shown in FIG. 6(a), the processing of the instruction B starts at a CL.sub.n+4.
FIG. 6(c) illustrates a pipeline processing which realizes efficient, speedy processing. In this pipeline processing, the four processes: "instruction fetching", "instruction decoding", "instruction executing", and "result storing", can be executed such that the respective processes overlap each other. More specifically, this enables the four control operations to be executed in parallel. At a timing CL.sub.1, the instruction fetch process for instruction A is performed, and at the next timing CL.sub.2, the instruction A is subjected to decoding processing while instruction B is fetched. At a timing CL.sub.3, the execution process to the instruction A, the decoding process to the instruction B, and the fetch process to instruction C are performed. Thus, execution of pipeline processing enables instructions whose number is equal to a pipelining processable number to be processed in parallel, thereby speeding up the whole processing.
If instructions A to D are instructions which allow simple, sequential execution, an idealistic pipeline processing as illustrated can be implemented by processing them in an order in which the instructions are described in a program. However, some programs heavily use repetition, branching, and selecting, and a plurality of programs cannot be executed simply in the order in which the instructions are described. Moreover, the result of a instruction execution process might decide whether the following process should be performed, or branching or selecting should be performed. Therefore, pipeline processing according to the flow of the program is required for enhancing efficiency.
A description is given of a prior art instruction prefetching method associated with a prediction process. In a pipeline processing, instruction fetch processing is performed by predicting an instruction to be executed next before finally determining an address of the instruction. Such instruction fetch process by prediction is called instruction prefetch processing. Now, a description is given of a conventional instruction prefetching apparatus for executing that instruction prefetch processing associated with prediction process in a processor.
FIG. 7 is a block diagram of a prior art instruction prefetching apparatus, FIG. 8 is a diagram showing a part of a program including an instruction as an object of prefetch processing, and FIG. 9 is a timing chart illustrating a schematic operation in the prior art prefetching apparatus.
As shown in FIG. 7, t he prior art instruction prefetching apparatus comprises a prefetch address generating unit 300, a first decoding unit 303, a branch predicting unit 304, an instruction buffer 305, a second decoding unit 306, an operation performing unit 307, and a condition code storage unit 308.
The prefetch address generating unit 300 generates an address of a prefetch target (hereinafter also referred Lo as prefetch address) based on a prediction of the branch predicting unit 304 and a condition code, and outputs a prefetch address signal S301. The first decoding unit 303 performs decoding processing to a prefetched instruction S302 to obtain the instruction and output it to the instruction buffer 305 which is later described. The first decoding unit 303 also identifies the type of the prefetched instruction S302 from the result of the decoding processing, and when the type of the prefetched instruction is a conditional branch instruction, the decoding unit outputs a signal indicating this type of the prefetched instruction to both the branch predicting unit 304 and the prefetch address generating unit 300. The branch predicting unit 304 predicts how branching is performed to process steps and prefetches an instruction subsequent to the condition al branch instruction, and outputs the prediction result to the prefetch address generating unit 300. Prediction methods include a method in which branch direction is predetermined based on branching probability and a method in which branch direction is decided based on a history of the branch directions of the conditional branches used before. The instruction buffer 305 temporarily stores the instruction fetched by the decoding processing of the first decoding unit 303 for a next processing The second decoding unit 306 takes the temporarily stored instruction from the instruction buffer 305, and decodes it sequentially to control the operation process performed by the operation performing unit which is later described, according to the obtained result. The operation performing unit 307 performs operation under control from the second decoding unit 306, and outputs a signal indicating the result of the operation to the condition code storage unit 308 when the result of the operation can affect a condition code stored in the condition code storage unit 308. The condition code storage unit 308 stores a condition code obtained based on the result of the operation processing input from the operation performing unit 307.
The schematic operation of the prior art instruction prefetching apparatus thus configured is as follows.
At a timing, the prefetch address generating unit 200 generates an address of an instruction to be fetched next and outputs it as a prefetch address signal S301. In a processor using the instruction prefetching apparatus, an instruction is fetched using this prefetch address signal S301 and input to the instruction prefetching apparatus. At a next timing, the prefetched instruction is input to the first decoding unit 303 to be subjected to decoding processing.
The first decoding unit 303 judges whether the processed instruction is a conditional branch instruction or not When the instruction is a conditional branch instruction, the first decoding unit 303 outputs a signal indicating this fact to both the prefetch address generating unit 300 and the branch predicting unit 304. The processed instruction is output to the instruction buffer 305 whether that instruction is a conditional one or not.
The instruction temporarily stored in the instruction buffer 305 is then taken by the second decoding unit 306 where the content of the instruction is interpreted through decoding processing. The second decoding unit 306 outputs a signal indicating the content of the obtained instruction to the operation performing unit 307 where the operation processing corresponding to the content of the instruction is performed based on the signal input. The result of the operation processing is processed by a processor including the instruction prefetching apparatus, and when that result can affect a condition code, it is output to the condition code storage unit 308 for storage.
Receiving the signal indicating that the processed instruction is a conditional branch instruction is input from the first decoding unit, the branch predicting unit 304 performs prediction processing according to a predetermined method to output a signal indicating the result of the prediction to the prefetch address generating unit 300.
The prefetch address generating unit 300 generates an address of an instruction to be prefetched next. If the signal indicating conditional branch instruction is not input from the first decoding unit 303, the prefetch address generating unit generates an address of an instruction to be prefetched next with reference to the condition code stored in the condition code storage unit 308 and outputs it as a prefetch address signal S301. On the other hand, if such signal is input from the first decoding unit 303, the prefetch address generating unit generates an address of an instruction to be prefetched next based on the signal informing the prediction input from the branch predicting unit 304, and outputs it as a prefetch address signal S301.
Here a description is given of the case of executing a program shown in FIG. 8 with reference to the timing chart of FIG. 9, which program illustrates the following procedures. In the figure, instruction 400 is an add instruction which instructs to perform addition to the data stored in registers D0 and D1 and stores the result in a register D2. In this program, the result of the operation according to the instruction 400 decides the type of the branch of instruction 401. More specifically, the execution result of the instruction 400 decides a condition code which the instruction 401 as a conditional branch instruction uses to decide the target of the branch. Here it is assumed that a zero flag of a condition code is set when the result is 0. Since the instruction 401 is a conditional branch instruction, branching for the process steps is performed by selecting an instruction to be executed next according to the zero flag of the condition code. If the zero flag is not set, the instruction to be executed next is the instruction 403, whereas the instruction 402 is selected as an instruction to be executed next if the zero flag is set.
Next, a description is given of the operation for executing the program shown in FIG. 8 by a processor including the prior art instruction prefetching apparatus, with reference to the timing chart of FIG. 9.
At a timing t.sub.1 shown in the timing chart of FIG. 9, an address of instruction 400 is generated in the prefetch address generating unit 300 to be output as a prefetch address signal S301, inputting it to the instruction prefetching apparatus. The processor using this instruction prefetching apparatus fetches the instruction 400 using the prefetch address signal S301. At a next timing t.sub.2, the prefetched instruction 400 is input to the first decoding unit 303 to be subjected to decoding processing.
Since the instruction 400 is not a conditional branch instruction, it is not necessary to output a signal indicating that the processed instruction is a conditional branch instruction, and the instruction 400 is output to the memory buffer 305 The second decoding unit 306 decodes the instruction 400 taken from the instruction buffer 305 to obtain an interpretation that the instruction 400 is an add instruction, outputting the result of the decoding processing to the operation performing unit 307 by means of a signal. The operation performing unit 307 performs the addition instructed by the instruction 400, and since the result of the addition processing according to the instruction 400 can affect a condition code, the operation performing unit outputs the result of the addition to the condition code storage unit 308 where condition code change is stored. As described above, when the result of the addition is 0, a zero flag of a condition code is set, but here it is assumed that the result of the adding is not 0 and thus the zero flag is not set. At a timing t.sub.3 of FIG. 9, a conditional flag which decides the condition of a conditional branch of the instruction 401 is determined.
Thereafter, at the timing t.sub.3, an address of the instruction 401 is generated to fetch the instruction 401 as a conditional branch instruction. At a timing t.sub.4, the fetched instruction is input to the prefetching apparatus to be decoded by the first decoding unit 303. Since the instruction 401 is a conditional branch instruction, a signal indicating this fact is output to both the prefetch address generating unit 300 and the branch predicting unit 304.
Receiving the signal, the branch predicting unit 304 performs prediction and outputs a signal indicating the result to the prefetch address generating unit 300. Here it is assumed that the branch predicting unit 304 is set to predict "an instruction described next in the program". Therefore, a signal indicating "next instruction" is output from the branch predicting unit 304 to the prefetch address generating unit 300. At a timing t.sub.5, the prefetch address generating unit 300 generates an address of instruction 402 which is "the next instruction" for the conditional branch instruction 401 and outputs a prefetch address signal S301 indicating "instruction 402". In the processor, the instruction 402 is thus prefetched and input to the first decoding unit 303 at a timing t.sub.4 of FIG. 9.
On the other hand, the condition of the conditional branch of the instruction 401 is judged using the conditional flag determined at the timing t.sub.3. Since the zero flag is not set according to the above assumption, instruction 403 is the instruction to be executed next. This means that an address of a branch target is determined at a timing t.sub.6. At this stage, an address of an instruction to be fetched next have to be determined. However, in this case, that address of the instruction to be fetched next (the address of the instruction 403) does not match the address generated by the prefetch address generating unit 300 (the address of the instruction 402). This means a branch prediction error, and it is necessary to prefetch a next instruction again at a next timing t.sub.7.
In the prior art instruction prefetching apparatus above described, an instruction following a conditional branch instruction is prefetched only based on the prediction by the branch predicting unit 304. Then, as shown in FIG. 9, even if the execution of the instruction 400 has been completed when the conditional branch instruction 401 is prefetched and the condition code for this branch instruction is already determined (at the timing t.sub.3), it is impossible to decide an instruction to be prefetched next using this result. Therefore, smooth pipeline processing can be performed as long as the prediction comes true, resulting in improved efficiency of the processing. However, when the prediction is not realized, the already fetched instruction has to be canceled to fetch a correct instruction, causing an undesired delay. Further, prefetching the instruction 402 results in a vain operation, which means that the device resource of the whole processor including the prefetching unit is not utilized effectively.