1. Field of the Invention
The present invention relates to data processing, and more particularly to prefetching of branch and other Meta information between hierarchical storage levels.
2. Description of the Related Art
In most high performance processors, pipelining is used as a means to improve performance. Pipelining allows a processor to be divided into separate components where each component is responsible for completing a portion of an instruction's execution.
Referring to FIG. 1A, an illustration of the major components that make up a typical processor's pipeline 10 are shown. The components include an instruction fetch (stage I), instruction decode (stage II), address generation (stage III), operand fetch (stage IV), instruction execution (stage V), and store results (stage VI). Each instruction enters the pipeline and ideally spends one cycle at each pipeline stage. Assuming that each stage of the pipeline takes one cycle to complete, each instruction needs six cycles to pass through the pipeline. However, if the pipeline can be kept full, then each pipeline stage can be kept active, working on a different instruction, each at a different stage.
Hence, one instruction can be completed every cycle. Unfortunately, keeping the pipeline full and processing an instruction in one cycle for every stage of the pipeline is not easy. Pipeline stalls occur due to control flow dependencies, data dependencies, or instructions requiring multiple cycles to pass through a single pipeline stage. These stalls result in a performance loss.
To limit such performance losses, processors rely on two techniques, namely, caching and speculation.
By virtue of locality, a small structure can be used to retain information stored in a high latency, large structure if the information is used frequently. This “cached” information can then be accessed at the cost of accessing the smaller structure most of the time. The figure of merit of such structures, namely the hit rate, is the probability with which the information can be accessed in the smaller structure. If the hit rate of a locality based structure is high, then the average time spent in accessing the large structure is almost the same as accessing the small structure. By virtue of speculation, a hardware structure can be used to guess an unpredictable outcome of the program flow and can allow the processing of the instruction to continue. In case the guess is wrong, the instruction is processed again. The prediction rate determines the success of these structures and is the probability with which the structure can predict the outcome correctly. If the prediction rate is high then the performance lost due to uncertainty is minimized.
Such structures, which store information, which helps in processing the instruction faster, are called Meta-structures and the stored information is called Meta-information.
Referring to FIG. 1B, a functional block diagram of a pipeline 10 with Meta-structures is shown. These Meta-structures, for example, a translation look aside buffer (TLB), branch predictor, branch target buffer (branch history table (BHT)), cluster predictor, value predictor, address generation interlock (AGI) predictor, operand store compare predictor (OSC), etc., reduce the number of cycles spent in pipeline stalls.
The hit rate/prediction rate of the Meta-structures depends on their size. A larger structure (table) provides a better hit rate and increases performance. However by increasing table size, the access time of the structure is increased and performance is degraded.
The branch prediction mechanism used by a processor exhibits these properties. For example, a larger branch history table has a higher branch prediction rate, and performance increases. However, a larger branch history table requires more time for each access. Increasing the access time of a BHT increases branch error penalties and performance is lost. Thus, it is desirable for the BHT to have a fast access time (characteristic of a small table) and still have a very high branch prediction rate (characteristic of a large table).
One aspect of the present invention includes providing both of these features necessary for increased performance, for example, a branch prediction mechanism that delivers a fast access time while still achieving a very high branch prediction rate characteristic of a large table.