Instruction flow in a digital data processor typically requires that instructions are fetched and decoded from sequential locations in a memory. A branch instruction is an instruction that causes a disruption in this flow, e.g., a taken branch causes decoding to be discontinued along the sequential path, and resumed starting at a new location in memory. The new location in memory may be referred to as a target address of the branch. Such an interruption in pipelined instruction flow results in a substantial degradation in pipeline performance.
One type of branch instruction is known as an unconditional branch in that it unconditionally transfers control from the branch instruction (BR) to the target instruction (TARG). That is, at the time that the branch instruction is decoded, it is known that the transfer of control to TARG will take place. A more costly, in terms of performance, branch instruction is known as a conditional branch (BC). This instruction specifies that control is to be transferred to TARG only if some condition, as determined by the outcome of a previous instruction, is met.
If it can be determined at instruction decode time that a conditional branch instruction will not be taken then there is no penalty associated with the execution of the conditional branch instruction. That is, the next sequential instruction may be decoded immediately following the decode of the branch instruction. If it is determined that the branch will be taken, a multi-cycle penalty associated with the branch is still incurred in that the target address must be generated and the target instruction must be fetched.
Several conditional branch prediction mechanisms are known in the art. Mechanisms that attempt to predict the outcomes of conditional branches at instruction decode time are known as decode-time prediction mechanisms. One particular type of decode-time predictor is referred to the "Decode History Table" (DHT) as described in U.S. Pat. No. 4,477,872 and in U.S. Pat. No. 4,430,706.
The DHT is a table of entries where an entry is accessed based on a transformation, such as a hash or truncation transformation, on the bits that define the address of a branch instruction. The entry itself comprises a single bit and is set if the corresponding branch instruction was taken the last time that it was executed, otherwise the bit is not set. If the DHT entry is set for a particular branch then the target address is generated and the target instruction fetched and decoded. If the DHT entry is not set the next-sequential instruction is decoded on a cycle following the decode of the branch instruction.
Another type of mechanism, known as a prefetch-time prediction mechanism, attempts to anticipate taken branches and to fetch target instructions prior to the time that the branch instructions are decoded. The prefetch-time prediction mechanism is incorporated into an instruction prefetch engine and redirects instruction prefetching down a branch-target path immediately following the prefetch of a predicted taken branch. By so doing, the prefetch-time mechanism ensures that an instruction buffer contains the branch target instruction at the time that the branch instruction is decoded, thereby allowing the branch target instruction to be decoded immediately following the decode of the branch instruction. As a result, a prefetch-time mechanism eliminates all branch instruction related time penalties when it predicts correctly.
Prefetch-time prediction mechanisms typically are variations on the Branch History Table (BHT), as first described in U.S. Pat. No. 3,559,183. The BHT is the prefetch-time analog of the Decode History Table. That is, the BHT is a table of entries that is accessed based on a transformation, hash or truncation, on the bits that define the address of the block of instructions that is being prefetched. The entry itself is more complex than a DHT entry in that the BHT operates "blindly" at prefetch time. That is, the BHT fetches blocks of instructions without the benefit of examining the content of the blocks. Thus, a BHT entry must be able to identify that an associated block of instructions contains a taken branch, based on a taken branch having been previously encountered within the block of instructions. Furthermore, it must be able to identify where, within the block, the taken branch instructions reside, since the particular branch instruction may not be relevant to current instruction fetching, depending on where the block is entered. Finally, the entry must specify the branch target address, so that prefetching can be immediately redirected down the target path should the particular branch be relevant to the current prefetch activity.
When the processor encounters a branch instruction that is found to be taken, it creates a BHT entry based on the address of the branch, the entry itself containing the branch target address. If the particular section of instructions containing the branch is ever reencountered, the BHT entry causes prefetching to be redirected at the time the branch instruction is prefetched. When the BHT redirects prefetching, it also enqueues information regarding this action, such as the address at which it "believes" there is a taken branch and the target address of the branch. In the case where the BHT correctly anticipated the branch, there is no penalty associated with the branch.
Branch instructions within a program that cause control to be transferred to a subroutine are referred to as subroutine call instructions. The branch instruction within the subroutine that transfers control back to the calling procedure is referred to as a subroutine return instruction. A subroutine may call other subroutines, resulting in what is known as nested subroutine calling. In some instruction set architectures subroutine call and return instructions are explicit. That is, all subroutine calls are implemented with a CALL instruction and all subroutine returns are implemented with a RETURN instruction.
When CALL and RETURN are explicit instructions, subroutine returns are readily handled with a stack. A stack is employed to handle subroutine returns in U.S. Pat. No. 4,586,127 and in U.S. Pat. No. 4,348,721. The general technique taught by this prior art is as follows: for each call instruction, push the return address onto the stack, and for each return instruction, pop the stack and use the contents as the return address. However, there is no branch prediction involved. In fact, in many processor architectures in which CALL and RETURN are explicit, the instructions are defined to operate through a stack.
In other instruction set architectures subroutine call and return instructions are not explicit, but are instead implemented with general branch instructions. In this case there is significant difficulty in determining which of the branch instructions are calls, and which are returns. The following articles propose methods to infer which of the branch instructions may be calls and returns based on the types of instructions that surround the branch instructions.
J. Losq in an article entitled "Subroutine Return Address Stack", IBM Technical Disclosure Bulletin, Vol. 24, No. 7a, December 1981 teaches a single stack that operates in conjunction with a Branch History Table. The return points of all potential calling instructions are pushed onto the stack, and the stack is popped in the event of every potential returning instruction. Losq recognizes and states that not every potential calling instruction is a subroutine call, and not every potential returning instruction is a subroutine return, resulting in irrelevant information being pushed onto the stack.
P. G. Emma et al. in an article entitled "Highly Accurate Subroutine Stack Prediction Mechanism" IBM Technical Disclosure Bulletin, Vol. 28, No. 10, March 1986 teach an increased accuracy of prediction achieved through a greater hesitancy to predict. This is accomplished by inhibiting the prediction based on intervening sequences of Load Multiple (LM) and Store Multiple (SM) instructions, and purging the stack in the event of a Load Program Status Word (LPSW) instruction.
In both of the foregoing articles a stack is maintained for all potential call instructions and is employed to predict a possible return instruction under a restrictive set of circumstances.
C. F. Webb, in an article entitled "Subroutine Call/Return Stack", IBM Technical Disclosure Bulletin, Vol. 30, No. 11, April 1988 also discusses the use of stacks in conjunction with a Branch History Table.
It is thus an object of the invention to provide a Branch History Table that does not require external stacks, wherein linkage information is managed directly by the Branch History Table.