1. Field of the Invention
The present invention relates generally to an arrangement for predicting a branch target address using a branch history table (BHT) in a digital data processing system, and more specifically to such an arrangement by which a branch target address can be predicted irrespective of a time duration for which the BHT is renewed or updated.
2. Description of the Prior Art
In most pipeline processors, branch instructions are resolved in an execution unit. Accordingly, there are several cycles of delay between the decoding of a branch instruction and its execution/resolution. In an attempt to overcome the potential loss of these cycles, it is known in the art to guess, using a BHT, as to which instruction specified by a branch target address is to be applied to the execution unit.
Before turning to the present invention it is deemed advantageous to briefly discuss a known technique with reference to FIGS. 1-4.
FIG. 1 is a block diagram showing schematically a known arrangement of the type to which the present invention is applicable. As shown in FIG. 1, a system controller 10 is operatively coupled with a processing unit 12, a main memory 14 and an input/output (I/O) controller 16.
A task controller 18, provided in the processing unit 12, applies an initial instruction address of a given job to an instruction prefetch address register 20 through a line (1). It is assumed that a cache memory 22 has already stored the whole or part of the instruction array (sequence) of the above-mentioned given job which is applied from the task controller 18 via a line (2). The cache memory 22 is supplied with the prefetched address from the address register 20 and, searches for an instruction specified by the prefetched address.
Upon a cache hit, the cache memory 22 applies the corresponding instruction to the task controller 18 and also to a microprogram memory 30, both of which form part of an execution unit 32. The microprogram memory 30 has previously stored a plurality of microprograms for executing an instruction applied thereto of the given job. The execution unit 32 further comprises a buffer memory 34 and an execution circuit 36. The buffer memory 34 stores operand data in this case, while the execution circuit 36 runs or carries out the microprograms using the operand data within the buffer memory 34 under the control of the task controller 18.
In the case of the cache hit, the cache memory 22 issues a selector control signal SC-1 which is applied to a selector 26 via a line 24. An adder 28 is provided to increment the instruction address relating to the cache hit by predetermined bytes in order to derive the next instruction address. In more specific terms, upon occurrence of the cache hit, the control signal SC-1 allows the selector 26 to apply the content thereof to the register 20 and, thus the next instruction address data is stored in the instruction prefetch address register 20.
Contrarily, in the case of a cache miss, the contents of the cache memory 22 is renewed in a manner well known in the art.
As indicated above, the reduction of branch penalty (viz., loss of cycles) is attempted through the use of history focussed on instruction prefetching. A BHT utilizes the address of the instruction array (viz., stream) being prefetched for accessing the table. If a taken branch were previously encountered at that address, the BHT indicates so and, in addition, provides the target address of the branch on its previous execution. This target address is used to redirect instruction prefetching because of the likelihood that the branch will repeat its past behaviour. The advantage of such an approach is that it has the potential of eliminating all delays associated with branches.
As shown in FIG. 1, a BHT section 40 includes a BHT which is established within a branch instruction address (BIA) array memory 42 and a branch target address (BTA) array memory 44. The BHT section 40 further includes a comparator 46. The BIA array memory 42 stores a plurality of branch instruction addresses, while the BTA array memory 44 a plurality of branch target addresses which correspond, on one to one basis, to the counterparts stored in the memory 42. The instruction prefetch address register 20 supplies the two memories 42, 44 with a prefetched instruction address via a line 43. The comparator 46 is provided to compare the prefetched instruction address from the register 20 and the output (viz., branch instruction address) of the memory 42. If the comparator 46 detects coincidence of the two instruction addresses applies (viz., a hit), it outputs a selector control signal SC-2 indicative of the hit over a line 45 and thus allows the selector 26 to supply the register 20 with the corresponding branch target which is derived from the memory 44 through a line 47. Following this, the instruction specified by the address of the branch target is searched at the cache memory 22. Contrarily, in the event that the comparator 46 notes a miss hit, it issues the control signal SC-2 representing same and thus inhibits the application of the output of the memory 44 to the register 20.
Writing a new piece of branch information into the BHT (viz., updating of BHT) is implemented under the control of the task controller 18. In more specific terms, when the task controller 18 detects that the execution circuit 36 fails to execute the branched instruction due to the failure of the branch target address prediction, the task controller 18 updates the BHT by writing a more likely pair of branch address and the corresponding branch target address into the memories 42, 44 via lines (3), (4).
The operations of the BHT section 40 will be further discussed with reference to FIGS. 2A, 2B, 3 and 4.
FIG. 2A is a diagram schematically an instruction sequence A0.fwdarw.BR(Branch).fwdarw.A1.fwdarw.A2.fwdarw.A3.fwdarw.A4.fwdarw. stored in the cache memory 22. It is assumed that these six instructions are derived from the cache memory 22 as a group whose length is one word (8-byte) and which includes two instructions as illustrated. Accordingly, the instruction address at the left side are depicted by "a", "a+8", "a+16" wherein the address "a" is the initial address of the instruction sequence in question. It is understood that the adder 28 increments the address applied thereto by 8-byte in this particular case. The instruction group(s) derived from the cache memory 22 is stored in a suitable buffer (not shown in the accompanying drawings) and then the instructions are sequentially applied to the execution unit 32.
FIG. 2B is a flow-chart depicting a routine which executes the above-mentioned instructions A0-A4 at steps 50A-50F. For the sake of a better understanding, FIG. 2B shows the addresses of the instructions A0-A4 in the cache memory 22. As shown, a very small branch loop 52 is established between steps 50A and 50B.
FIG. 3 is a diagram showing pipelined operations including five stages denoted by IF, DC, AD, OF and EX. A line extending from the stage AD to the stage IF corresponds to the line (1) via which the initial instruction address (viz., "a") is applied to the prefetch address register 20. The stages IF, DC, AD, OF and EX implement the following operations:
(a) IF: Instruction prefetch at block 20; PA0 (b) DC: Instruction decode at block 30; PA0 (c) AD: Address generation at block 18; PA0 (d) OF: Operand fetch at blocks 18, 34; and PA0 (e) EX: Instruction execution at blocks 18, 36.
In the case where the miss hit of the branch instruction address at the comparator 46 of the BHT section 40 is found at the pipeline stage EX, the task controller 18 updates the BHT by writing thereinto the most highly guessed branch instruction address and the branch target thereof.
FIG. 4 is a timing chart which characterizes the prior art operations at the pipeline stages IF, DC, AD, OF and EX shown in FIG. 3. It is assumed that the BHT section 40 fails to search for the branch instruction address "a+4" (viz., miss hit) at time clock T0. Accordingly, the execution circuit 36 is unable to determine a branch target at time clock T5. In this instance, the operations implemented at all the stages IF-EX are canceled or rendered invalid. Thus, the task controller 18 carries out, at time clock T6, updating by writing the branch instruction address "a+4" into the memory 42 and also writing the branch target "a" into the memory 44 via the lines (3), (4). This means that the miss hit again occurs at the BHT 40 at time clock T6 in that the updating is implemented during the same time clock T6.
The operations during time clocks T6-T11 are exactly identical with those during time clocks T0-T5. That is, the execution circuit 36 is again unable to determine the branch target address "a" at time clock T11. At the next time clock T12, the comparator 4G detects the hit and thus the target address "a" is applied to the instruction prefetch address register 20. Therefore, the execution circuit 36 is now able to execute the branch operation at time clock T17. In the above-mentioned case, there exists the 4 cycle loss during T12-T15 (LOSS B) in addition to the 4 cycle loss during T6-T9 (LOSS A). This kind of problem is frequently encountered when the execution unit 32 executes a program sequence including short loops as indicated in FIG. 2B. This arises from the fact that, before completing the updating of the BHT, the next branch instruction should be executed.
Summing up, the above-mentioned prior art has encountered the problems that such a cycle loss as indicated by LOSS B is inevitably present in the case where the program sequence to be executed includes the aforesaid short type of branch loop.