1. Field of the Invention
The present invention relates to a data processing apparatus and method for handling retrieval of instructions from an instruction cache.
2. Background of the Invention
Caches are widely used in data processing systems to temporarily store instructions and/or data values for access by a processing unit. If the instruction or data value required by the processing unit is already stored in such a cache, this can significantly improve the performance of the data processing system by avoiding the need to obtain that instruction or data value from main memory at the time it is required by the processing unit. However, the performance of a cache lookup operation within a cache in order to determine whether the requested instruction or data value is within the cache consumes significant power, and in many applications it is highly desirable to reduce power consumption within the data processing system.
Often caches are arranged in an n-way set associative structure, and typically a cache lookup operation will involve performing a lookup in each of the ways of the cache. With the aim of seeking to reduce power consumption within a cache, a number of cache way prediction or cache way tracking techniques have been developed which aim to reduce the power consumption involved in a cache lookup by excluding from the lookup any of the ways that the prediction or tracking technique indicates will not store the instruction or data value being requested. By limiting the cache lookup operation to a subset of the total number of ways, this reduces power consumption.
Irrespective of whether such cache way prediction or tracking techniques are used, there is still a further power consumption issue that can arise in association with accesses made to an instruction cache, due to the way in which that instruction cache is used when fetching instructions for execution within the execution circuitry of the processing unit. In particular, the execution circuitry will execute a sequence of instructions, and that sequence of instructions will include various branch instructions. When a branch instruction is executed by the execution circuitry, this will either result in a not taken branch, as a result of which the next instruction to be executed will be the instruction at the address following the branch instruction, or will result in a taken branch, as a result of which a target address will be determined identifying the next instruction to be executed. Accordingly, when execution of a branch instruction results in a taken branch, there will be a change in instruction flow.
Fetch circuitry is typically used to request instructions from the instruction cache with the aim of obtaining a near continuous stream of instructions for issuance to the execution circuitry, in order to keep the execution circuitry fed with instructions that can be executed. Branch prediction circuitry is often used to predict, for any identified branch instructions fetched by the fetch circuitry, whether those branch instructions will or will not result in a taken branch when that branch instruction is subsequently executed by the execution circuitry. In very high performance processors where power consumption is not a significant concern, it is known to provide branch prediction circuitry very close to the instruction cache fetching mechanism, so that there is no, or only a minimal, “branch shadow”, i.e. the prediction as to whether a branch instruction will result in a taken branch or not can be made within the same clock cycle that the instruction is fetched from the instruction cache. The fetch circuitry can then determine the next instruction to fetch dependent on the result of the branch prediction circuitry. However, such an arrangement consumes a large amount of power.
Accordingly, in many high performance pipelined processors, where power consumption is a significant issue, the branch prediction circuitry typically operates several pipeline stages (for example 1 to 4 pipeline stages) after a fetch from the instruction cache has been performed. During the period between which a branch instruction is fetched from the instruction cache, and that same branch instruction is analysed by the branch prediction circuitry, the fetch circuitry will typically continue to speculatively fetch instructions from sequential addresses following the branch instruction. However, if the branch prediction circuitry subsequently predicts that execution of that branch instruction will result in a taken branch, those speculative instructions that the fetch circuitry has fetched in the interim will need to be discarded. This results in a waste of instruction cache lookup power, due to the need to perform a lookup in the instruction cache for the speculative instructions, only for them subsequently to be discarded.