1. Field of the Invention
This invention relates to storing branch predictions generated within a microprocessor.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance through the use of pipelining, parallel execution, and high clock rates. Pipelining is an implementation technique whereby multiple instructions are overlapped during the execution process. Parallel execution refers to the simultaneously executing multiple instructions in a clock cycle. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. At the end of a clock cycle, the resulting values are moved to the next pipeline stage.
Pipelining has several hazards associated with it. One particular hazard is stalling the pipeline due to branch instructions. When a branch instruction propagates through the pipeline, it is difficult to determine which instructions after the branch should be processed until the results of the branch instruction are know. For example, if the branch instruction is xe2x80x9ctakenxe2x80x9d, then the next instruction to be executed after the branch may be located at a particular address that is offset from the branch instruction""s address. In contrast, if the branch instruction is xe2x80x9cnot takenxe2x80x9d, then the next instruction to be executed may be located at the address immediately following the branch instruction. As a result, the initial stages of the pipeline may be unable to determine which instructions should begin execution in the pipeline following the branch instruction. Thus, the pipeline may stall awaiting the results of the branch instruction.
In order to prevent the instruction pipeline from stalling, microprocessor designers may implement branch prediction schemes to provide the initial pipeline stages with a predicted result for each branch instruction. The initial stages of the pipeline speculatively execute instructions along the predicted path until the branch instruction executes and one of the following occurs: (1) the prediction is found to correct, in which case the instructions continue to execute and are no longer speculative, or (2) the prediction is found to be incorrect, in which case all pipeline stages executing instructions after the branch are flushed and the pipeline starts anew using the correct path.
Many branch predictions schemes involve storing a prediction bit indicating whether the branch instruction is taken or not taken, and a predicted target address for when the branch instruction is taken. If the prediction is determined to be incorrect upon execution of the branch instruction, then the prediction bit is updated to reflect the actual results of the branch instruction. Some microprocessors use more complex schemes for branch prediction rather a simple taken/not taken prediction. For example, a two-bit prediction scheme may be used to increase prediction accuracy when branch instructions are either taken a high percentage of the time or not taken a high percentage of the time (e.g., in a loop). In two-bit prediction schemes, a prediction must miss twice before it is changed.
While the particular algorithms for each type of branch prediction scheme may vary, all tend to store some form of historical information that is developed as each branch instruction is executed. In some configurations, separate branch prediction information is stored for each branch instruction according to its address. This type of branch prediction scheme is illustrated in FIG. 1. The hardware used to store the prediction information is typically referred to as a xe2x80x9cbranch target bufferxe2x80x9d. One potential drawback of the branch target buffer illustrated in FIG. 1, is that the number of branch predictions is limited by the size of the branch target buffer. For example, assuming the branch target buffer has storage locations sufficient to store 64 branch predictions, then upon detecting a sixty-fifth branch instruction, the buffer must begin discarding the previously generated branch prediction information to make room for new branch prediction information. The size of this type of branch target buffer may be further limited by a number of factors including the desired access speed.
Other schemes that may be capable of storing more prediction information and or having faster access times may use branch target buffers that have structures mirroring the microprocessor""s instruction cache. Instruction caches are high speed memory arrays that typically reside within the microprocessor. Instruction caches are characterized as having fast access times and high data output rates when compared with the access times and output rates of other memories that are further away from the microprocessor, e.g., main system memory. Instruction caches are typically organized into a plurality of blocks or xe2x80x9ccache linesxe2x80x9d. A cache line typically refers to the smallest amount of storage that may be allocated within the instruction cache. For example, an instruction cache may be 32 kilobytes large and may have cache lines that are 16 bytes long.
When instruction bytes are read from main system memory into the instruction cache, they are read in fixed byte-length sequences (e.g., 16 byte sequences) that typically match the cache line length. Each instruction sequence (referred to herein as a xe2x80x9cprefetch linexe2x80x9d) is typically stored in its own cache line along with an address xe2x80x9ctagxe2x80x9d. The address tag is a predetermined portion of the instruction sequence""s address that serves to identify which instruction bytes are stored within a particular cache line.
Some cache configurations put limits on where a prefetch line having a particular address may be stored. A xe2x80x9cfully associativexe2x80x9d cache allows a prefetch line to be stored in any cache line within the cache. Conversely, a xe2x80x9cdirect mappedxe2x80x9d cache forces a prefetch line to be stored in a particular location within the cache according to its address. xe2x80x9cSet associativexe2x80x9d caches define a set of storage locations within which a prefetch line may be stored. Which set the cache line is assigned to is a function of the prefetch line""s address. These set associative caches may be visualized as two dimensional arrays with each row defining a set. The number of columns (or xe2x80x9cwaysxe2x80x9d) defines the level of associatively of the cache. For example, a cache having two columns is referred to as a two-way set-associative cache.
The overall size of an instruction cache size is limited by a number of factors, including the process used to manufacture the microprocessor and die space allocated to the instruction cache. Typically, only a small portion of the total instructions for a particular program may reside in the instruction cache at any one time. Thus, various cache management schemes are utilized to load and replace the contents of the instruction cache. The goal of these management schemes is to ensure that the instructions stored in the instruction cache at any given time are the ones most likely to be needed by the microprocessor. Thus cache lines are continually being loaded and overwritten with new instructions.
As previously noted, some branch prediction schemes use branch target buffers that mirror the microprocessor""s instruction cache structure. For example, if the instruction cache is 4-way set associative with 512 sets (i.e., a 4 by 512 array), the branch target buffer may be configured into an array having the same dimensions (4 by 512) and will store one set of branch prediction information for each cache line within the instruction cache. By mirroring the instruction cache""s configuration, the branch target array may be easily accessed in parallel with the instruction cache using the same portion of the requested instruction address. Thus, the branch target information corresponding to a particular cache line may be available at the same time or sooner than the instruction bytes stored within the corresponding cache line.
However, as previously noted the cache lines within the instruction cache are continually being loaded and overwritten with new instructions. Thus, under the current scheme, each time a cache line is overwritten the corresponding storage locations within the branch target buffer are also cleared or overwritten to make room for new branch prediction information corresponding to the new instructions within the cache line. If the instructions originally stored in the cache line are subsequently reloaded into the instruction cache, all of their previously generated branch prediction information is lost (i.e., xe2x80x9cvictimizedxe2x80x9d) and new prediction information must once again be generated from scratch. This may be particularly disadvantageous when more elaborate branch prediction schemes are used that develop more accurate predictions each time the branch instruction executes.
Thus, a method and apparatus for preventing the loss or victimization of stored branch prediction information is desired.
The problems outlined above are in large part solved by a microprocessor capable of caching victimized branch prediction information in accordance with the present invention. Instead of discarding branch prediction information corresponding to instructions that are replaced or discarded from the instruction cache, the branch prediction information is stored in a victim branch prediction cache.
Broadly speaking, one embodiment of a microprocessor capable of caching victimized branch prediction information comprises an instruction cache, a branch target array, and a victim branch prediction cache. The instruction cache is configured to receive and store instruction bytes and is coupled to the branch target array. The branch target array is coupled to the instruction cache and is configured to store branch target information corresponding to the stored instruction bytes. The victim branch prediction cache is coupled to the branch target array and is configured to output the stored branch target information to the victim branch prediction cache when the corresponding instruction bytes are no longer stored in the instruction cache. The victim branch prediction cache is configured to receive and store the branch target information.
In one embodiment, when the original instructions are restored to the instruction cache, their corresponding branch prediction information is restored to the branch target array from the victim branch prediction cache. The branch prediction information stored may vary from one implementation to another depending upon the particular branch prediction scheme being used. In one embodiment, address information may also be stored to identify which instructions the stored prediction information corresponds to.
In another embodiment, a microprocessor capable of caching victimized branch prediction information may comprise: an instruction cache configured to receive and store instruction bytes; a branch target array coupled to the instruction cache and configured to store branch target information corresponding to the stored instruction bytes; and a victim branch prediction cache interface. The interface may be coupled to the branch target array. The branch target array is configured to output the stored branch target information to the interface when the corresponding instruction bytes are no longer stored in the instruction cache. The interface is configured to convey the branch target information received from the branch target array to a victim branch prediction cache that is external to the microprocessor.
A method for storing victimized branch prediction information is also contemplated. Broadly speaking, one embodiment of the method comprises storing a plurality of instruction bytes into an instruction cache. A set of branch target information corresponding to the stored instruction bytes is generated. The branch target information is then stored in a branch target array. When the instruction bytes are overwritten in the instruction cache by a second plurality of instruction bytes, the first set of branch target information is written to a victim branch prediction cache instead of being discarded. The branch information stored within the victim branch cache may be restored to the branch target array from the victim branch prediction cache when the original set of instructions is restored to the instruction cache.
A computer system capable to storing victimized branch prediction information is also contemplated. Broadly speaking, in one embodiment the computer system may comprise a main system memory, a branch victim cache, and a microprocessor. The microprocessor is coupled to the system memory and the branch victim cache. The microprocessor may comprise an instruction cache configured to store instruction bytes and a branch prediction array configured to store predicted branch target information corresponding to said stored instruction bytes. The branch prediction array may be configured to output the stored branch prediction target information to the branch victim cache when the corresponding instruction bytes are no longer stored in the instruction cache.