1. Field of the Invention
The present invention relates to a data processing apparatus and method for providing target address information for branch instructions.
2. Description of the Prior Art
A data processing apparatus will typically include a processor core for executing instructions. Typically, a prefetch unit will be provided for prefetching instructions from memory that are required by the processor core, with the aim of ensuring that the processor core has a steady stream of instructions to execute, thereby aiming to maximise the performance of the processor core.
To assist the prefetch unit in its task of retrieving instructions for the processor core, prediction circuitry is often provided for predicting which instruction should be prefetched by the prefetch unit. The prediction circuitry is useful since instruction sequences are often not stored in memory one after another, and software execution often involves changes in instruction flow that cause the processor core to move between different sections of code depending on the task being executed.
When executing software, a change in instruction flow typically occurs as a result of a “branch”, which results in the instruction flow jumping to a particular section of code as specified by a target address for the branch. The branch can optionally specify a return address to be used after the section of code executed by the branch has executed.
Accordingly, the prediction circuitry can take the form of a branch prediction unit which is provided to predict whether a branch will be taken. If the branch prediction unit predicts that a branch will be taken, then it instructs the prefetch unit to retrieve the instruction that is specified by the target address of the branch, and clearly if the branch prediction is accurate, this will serve to increase the performance of the processor core since it will not subsequently need to stop its execution flow whilst that instruction is retrieved from memory. Typically, a record will be kept of the address of the instruction that would be required if the prediction made by the branch prediction circuitry was wrong, such that if the processor core subsequently determines that the prediction was wrong, the prefetch unit can then retrieve the required instruction.
Often, such branches in instruction flow occur as a result of executing branch instructions. Branch instructions are often conditional, such that if they are executed the instruction flow will jump to an instruction specified by a target address of the branch instruction, whereas if they are not executed the next instruction will typically be the immediately following instruction in the address space.
There are various known mechanisms by which the branch prediction unit can predict whether a branch instruction will be executed or not, and accordingly whether the branch will be taken or not. Whenever it is predicted that the branch will be taken, it is also necessary to calculate the target address from which the next instruction should be fetched. For direct branch instructions, an immediate value (for example specifying an offset) is directly specified within the branch instruction, and the target address can be directly calculated using this immediate value and the address of the currently prefetched instruction. However, for indirect branch instructions, no immediate value is directly specified, and instead a working register of the processor core will typically be specified from where information used to determine the target address can be received. As a result, this information is not available to the prefetch unit at the time the prediction of the target address is required.
Accordingly, it is known to provide a branch target cache structure within the prefetch unit having a plurality of entries, where each entry can store branch instruction information for a branch instruction that has been executed by the processor, with that branch instruction information including an address identifier for the branch instruction and target address information.
Accordingly, for a currently prefetched instruction, the prefetch unit can initiate a lookup operation within the branch target cache structure in order to see if the address of the currently prefetched instruction matches an address identifier in one of the entries of the branch target cache structure. If so, a hit will be detected, and the associated target address information will be returned. Assuming this currently prefetched instruction is an indirect branch instruction that is predicted as taken by the prediction circuitry, the returned target address information will then be used in order to determine the target address.
If a hit is not detected within the branch target cache structure then it is typically not possible to predict the target address for an indirect branch instruction, and hence the prefetch unit is not able to prefetch the instruction that will be required if the indirect branch instruction does in fact result in the branch being taken when that instruction is executed by the processor. In this scenario, it is clear that the performance of the processor is thereby impacted, since only once that indirect branch instruction has been executed by the processor, and the actual target address has been determined, can the prefetch unit then prefetch the required instruction.
Accordingly, one way to improve the processor performance is to increase the size of the branch target cache structure, so that more information is retained within the branch target cache structure, and accordingly the probability of a hit being detected is increased. However, such larger branch target cache structures clearly have an adverse impact on area and power consumption. In addition, as the sizes increase there will become a point where it will take multiple clock cycles to complete the lookup operation in order to determine whether a hit condition is present.
There is currently a desire to produce area and power efficient processors, where it is not practical to provide large branch target cache structures. Accordingly, a problem that arises is how to make the best use of the relatively small number of branch target cache entries that can be provided within such area and power efficient processors. In particular, with only a relatively small number of entries, it will be appreciated that the information in those entries is more likely to be evicted due to the need to allocate into the branch target cache structure branch target information relating to branch instructions executed by the processor but not currently having a corresponding entry in the branch target cache structure. This can be particularly problematic when using standard replacement mechanisms such as round-robin or pseudorandom replacement mechanisms in order to decide which entry to allocate new branch target information to, since no distinction is made between entries containing useful branch target information and entries whose stored branch target information has been less useful.
One approach that could be taken to seek to retain within the branch target cache branch target information that has proven to be more useful than others would be to maintain a “weighting” value against each of the entries so that entries whose contents have been proven to be more useful than others will be less likely to have their contents replaced. However, this requires introducing complexity into the replacement mechanism which can create performance issues due to the time then taken to process the cache and the weighting values before deciding which entry to allocate the new branch instruction information to.
Another approach that can be taken is to provide a two-level branch target cache arrangement where the first level branch target cache is kept small, but data evicted from that first level branch target cache to free up space for newly allocated branch instruction information is demoted to the second level branch target cache. Typically, the second level branch target cache will be slower to access than the first level branch target cache, and often the structure of the two caches, and the information maintained in their entries, is different. Examples of known two-level branch target cache arrangements are discussed in U.S. Pat. No. 5,163,140 and U.S. Pat. No. 7,783,870.
Accordingly, it would be desirable to provide an improved mechanism for providing target address information for branch instructions which alleviates the area, power and/or timing issues associated with known prior art techniques, whilst improving the retention of branch instruction information that has proven to be useful.