1. Field of the Invention
This invention relates to the field of data processing systems. More particularly, this invention relates to the field of predicting branch instructions in data processing
2. Description of the Prior Art
A data processing apparatus will typically include a processor core for executing instructions. Typically, a prefetch unit will be provided for prefetching instructions from memory that are required by the processor core, with the aim of ensuring that the processor core has a steady stream of instructions supplied to it, thereby aiming to improve the performance of the processor core.
To assist the prefetch unit in its task of retrieving instructions for the processor core, prediction logic is often provided for predicting which instruction should be prefetched by the prefetch unit. The prediction logic is useful since instruction sequences are often not stored in memory one after another, since software execution often involves changes in instruction flow that cause the processor core to move between different sections of code depending on the task being executed.
When executing software, a change in instruction flow typically occurs as a result of a “branch”, which results in the instruction flow jumping to a particular section of code as specified by a target address for the branch. The branch can optionally specify a return address to be used after the section of code following the branch has processed.
Accordingly, the prediction logic can take the form of a branch prediction unit which is provided to predict whether a branch will be taken. If the branch prediction unit predicts that a branch will be taken, then it instructs the prefetch unit to retrieve the instruction that is specified by the target address of the branch, and clearly if the branch prediction is accurate, this will serve to increase the performance of the processor core since it will not need to stop its execution flow whilst that instruction is retrieved from memory. Typically, a record will be kept of the address of the instruction that would be required if the prediction made by the branch prediction logic was wrong, such that if the processor core subsequently determines that the prediction was wrong, the prefetch unit can then retrieve the required instruction.
Branch prediction logic has been used in conjunction with branch target address caches (BTACs). In order to improve branch prediction success rates, dynamic branch prediction can be performed which uses historical information about what happened on previous branch instructions to predict what may happen. This historical information is typically stored in a BTAC, the BTAC being accessed by the prediction logic to determine if a branch should be taken or not.
Typically in such systems the program fetch unit PFU looks up the program counter to access the instruction within the I-cache and at the same time accesses the BTAC to see if there is an entry corresponding to that instruction. If the instruction that is fetched is a branch instruction the processor awaits the result from the BTAC look up to predict whether to branch or not. Such systems will have some latency as data accesses take a finite amount of time. Typical systems have a two cycle latency, thus two cycles are required before the information from the BTAC is accessed and branch prediction for the retrieved instruction can be performed. In some systems buffers have been used to store fetched instructions and their branch predictions in order to avoid this wait manifesting as bubbles in the pipeline. In this way the bubbles can be hidden and a continuous flow of instructions can be provided to the pipeline.
FIG. 1 schematically shows a system and timing diagram of the prior art with a two cycle latency. Program counter 10 provides a value indicating the next instruction to be fetched to both the I-cache 20 and the branch target cache (BTAC) 30 in parallel. As is shown two bubbles are introduced into the system due to the latency. Although this can be removed using buffers to store instructions before sending them to the pipeline. A further disadvantage of such a system is that as the instructions 2 and 3 are accessed when they are not required this increases power consumption. In systems with a latency of more than two cycles further unnecessary data accesses will be made.
FIG. 2 shows an alternative embodiment in which the problem of unnecessary additional cache accesses is reduced by accessing the BTAC in advance of the instruction cache. If there is a two cycle latency this can be done two cycles in advance and allows the prediction for a branch to be available when an access to the instruction cache for the subsequent instruction is to be initiated this allows the predicted instruction to be fetched rather than the subsequent one in the instruction stream and avoids the need to fetch the two additional instructions (2 and 3) that are not needed. However, the core then has to wait for two cycles until this instruction (10) is returned, thus bubbles are introduced into the instruction stream. Although these can be removed with an intermediate buffer, this leads to an increase in instruction fetch latency. A further potential problem with such a system is where two branches occur next to each other.