1. Field of the Invention
This invention relates in general to the field of pipelined microprocessor architecture, and more particularly to branch instruction target address prediction.
2. Description of the Related Art
Computer instructions are typically stored in successive addressable locations within a memory. When processed by a Central Processing Unit (CPU), the instructions are fetched from consecutive memory locations and executed. Each time an instruction is fetched from memory, a program counter, or instruction pointer, within the CPU is incremented so that it contains the address of the next instruction in the sequence. This is the next sequential instruction pointer. Fetching of an instruction, incrementing of the program counter, and execution of the instruction continues linearly through memory until a program control instruction is encountered.
A branch instruction, or program control instruction, when executed, changes the address in the program counter to some value other than the next sequential instruction address and thereby causes the flow of control to be altered. In other words, program control instructions specify conditions for altering the contents of the program counter. The change in the value of the program counter as a result of the execution of a branch instruction causes a break in the sequence of instruction execution. This is an important feature in digital computers, as it provides control over the flow of program execution and a capability for branching to different portions of a program. Examples of branch instructions include Jump, Test and Jump conditionally, Call, Return and Loop.
A Jump instruction causes the CPU to unconditionally change the contents of the program counter to a specific value, i.e., to the target address for the instruction where the program is to continue execution. A Test and Jump conditionally causes the CPU to test the contents of a status register, or possibly compare two values, and either continue sequential execution or jump to a new address, called the target address, based on the outcome of the test or comparison. A Call instruction causes the CPU to unconditionally jump to a new target address, but also saves the value of the program counter to allow the CPU to return to the program location it is leaving. A Return instruction causes the CPU to retrieve the value of the program counter that was saved by the last Call instruction, and return program flow back to the retrieved instruction address. A Loop instruction causes the CPU to decrement an iteration count in a register and conditionally change the contents of the program counter to a target address specified in the instruction if the iteration count has not reached zero.
In early microprocessors, execution of program control instructions did not impose significant processing delays because such microprocessors were designed to execute only one instruction at a time. If the instruction being executed was a program control instruction, by the end of execution the microprocessor would know whether it should branch, and if it was supposed to branch, it would know the target address of the branch. Thus, whether the next instruction was sequential, or the result of a branch, it would be fetched and executed.
Modern microprocessors are not so simple. Rather, it is common for modern microprocessors to operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, xe2x80x9can implementation technique whereby multiple instructions are overlapped in execution.xe2x80x9d Computer Architecture: A Quantitative Approach, 2nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. The authors go on to provide the following excellent illustration of pipelining:
A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipexe2x80x94instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.
Thus, as instructions are fetched, they are introduced into one end of the pipeline. They proceed through pipeline stages within a microprocessor until they complete execution. In such pipelined microprocessors it is often not known whether a branch instruction will alter program flow until it reaches a late stage in the pipeline. However, by this time, the microprocessor has already fetched other instructions and is executing them in earlier stages of the pipeline. Furthermore, even if the branch instruction is an unconditional branch instruction, the target address of the unconditional branch instruction may not be available until a later stage in the pipeline or must be fetched from memory. If a branch causes a change in program flow, all of the instructions in the pipeline that followed the branch must be thrown out. In addition, the instruction specified by the target address of the branch instruction must be fetched. Throwing out the intermediate instructions, and fetching the instruction at the target address creates processing delays in such microprocessors.
To alleviate this delay problem, many pipelined microprocessors use branch outcome prediction and branch target address prediction mechanisms in an early stage of the pipeline, and then fetch subsequent instructions according to the branch outcome and branch target address predictions.
A popular branch prediction scheme uses a branch target buffer (BTB) to make predictions about conditional branch instruction outcomes and to predict branch target addresses. A typical BTB is similar to a cache, where a given BTB entry is indexed by the address of a branch instruction that is being predicted. The data in the selected BTB entry includes the branch target address of the previous execution of the associated branch instruction and its outcome, i.e., whether the branch was taken or not taken. There is a high probability that the target address of the previous execution of the branch will also be the target address for the next execution of the branch. The next time the branch instruction is decoded, its address is used to index the BTB. The BTB generates a target address and outcome prediction for the branch instruction that can then be used to fetch subsequent instructions, in hopes that the target address was correctly predicted.
Like a cache, the BTB has many fewer entries than the memory address space it serves. That is, the entire address of the branch instruction is not used to index the BTB, but rather, only the lower bits are used. Therefore, a BTB suffers the same aliasing, or mapping, problems that a cache suffers. That is, two distinct branch instructions may index to the same BTB entry.
For example, assume two branch instructions A and B have the same lower address bits used to index the BTB. Assume branch A executes and its target address is updated in the BTB, and then branch B executes and its target address is updated in the same entry in the BTB, and then branch A executes again. The BTB will not contain the correct predicted target for branch A because it was replaced by the target address of branch B""s last execution. This is true although the target address for branch A""s last execution, i.e., probably the correct target address prediction, had been previously available in the BTB, but was replaced due to aliasing.
One means of minimizing this problem is to employ a set associative BTB, similar to set associative caches. However, set associative BTB""s do not completely solve the problem. As microprocessor pipeline depths continue to increase, resulting in more severe performance degradation when branches are mispredicted, a demand for even greater branch prediction accuracy is apparent.
Therefore, what is needed is a BTB in a branch prediction mechanism that more accurately predicts branch target addresses.
To address the above-detailed deficiencies, it is an object of the present invention to provide a more accurate branch target address predictor. Accordingly, in attainment of the aforementioned object, it is a feature of the present invention to provide a branch instruction target address predictor. The branch instruction target address predictor includes an instruction pointer register that stores an address at which instructions are fetched, a branch target buffer that stores target addresses related only to indirect branch instructions, and decode logic, coupled to the branch target buffer and the instruction pointer register, for decoding an instruction and providing one of the target address from the branch target buffer to the instruction pointer register if the instruction is an indirect branch instruction.
An advantage of the present invention is that it provides improved branch target address prediction by not populating the BTB with return addresses and direct branch instruction addresses, thereby reducing the probability of branch instruction aliasing in the branch target buffer. Another advantage of the present invention is that it enables a smaller branch target buffer to be employed since the branch target buffer is only predicting target addresses for one type of branch instruction and is not predicting whether conditional branches will be taken or not taken.
In another aspect, it is a feature of the present invention to provide an apparatus for predicting target addresses of branch instructions in a pipelined microprocessor. The apparatus includes a call/return stack that stores target addresses related to return instructions, an adder that calculates target addresses related to direct branch instructions, and a branch target buffer that stores target addresses related to indirect branch instructions. The predictor selects a target address provided by the stack, the calculator or the buffer for use by the microprocessor in fetching program instructions in response to determining if a branch instruction is a return, direct branch or indirect branch instruction.
In yet another aspect, it is a feature of the present invention to provide a method for predicting branch target addresses. The method includes fetching a branch instruction, determining if the branch instruction is a return instruction, a direct branch instruction or an indirect branch instruction, selecting a predicted target address of the branch instruction provided by a call/return stack, an adder or a branch target buffer, respectively, in response to determining the type of the branch instruction, and fetching a next instruction using the predicted target address in response to selecting the predicted target address.