To achieve higher performance most microprocessors are designed as superscalar processors having multiple execution units. The idea behind this concept is to increase instruction level parallelism further referred to herein as ILP. Because most instructions show dependencies which would lead to stalls in the processor's pipeline(s) until the dependency is resolved register renaming in combination with out-of-order execution allows improvements of ILP. Nonetheless a lot of dependencies still remain and prevent multiple instructions from being executed in parallel which leads to bubbles in the pipeline.
To increase efficiency and to overcome the bubbles in the pipeline load address or value prediction can help to avoid pipeline stalls, because even dependent instructions can be executed using speculatively calculated data. If it turns out that the predicted value was wrong the corresponding instructions must be re-executed which represents a performance reducing penalty.
To reduce the penalty for mispredicted values it is a) necessary to provide best possible load address/value prediction and b) necessary to determine the instructions whose operands can be predicted with high confidence and which cause low penalty even if the predicted address/value was wrong.
In particular, prior art value prediction can be separated into three categories: Load address prediction, prediction of source register values and prediction of target register values.
The simplest algorithm used in prior art value predictors is based on the assumption that the contents of memory locations and registers remains mostly unchanged. So, an appropriate prediction scheme is simply to predict the last value. The so-called last value predictor, further referred to herein as LVP, as depicted preferably in FIG. 1, comprises a table 10 which is addressed by hashing 12 of the instruction address with each entry consisting of a tag field 14 and a last value field 16.
The table is most likely organized as n-way set associative (e.g. n=4). If a match is found by determining that the tag field matches the instruction address, then the corresponding last value from this table entry is used for prediction. If there is no match, a new entry is made, replacing the Least Recently Used (LRU) table entry as determined by an LRU algorithm.
Regardless, the predictor is updated each time with the correct value, if it is confirmed.
Another prior art scheme is a simple extension of the LVP, as it is depicted in FIG. 2.
Two additional fields, the stride field 20 and a status field 22 are added to each table entry. The idea behind this predictor is that often memory contents are changed by a certain delta value, i.e. a stride. Thus, the next predicted value can be calculated by simply adding the stride to the last value. The status field is used to determine whether the predictor should predict the last value or the last value increased by a certain stride. So the stride predictor further referred to herein as SP is involved only if a certain stride could be found and confirmed as indicated by the status field.
If the stride predictor fails after some successful predictions it will switch back to last value prediction (switching the status field back to LVP) unless a new stride is found and confirmed.
The stride predictor, further abbreviated herein as SP is updated every time with the most current value. If the stride changes it is used only if the new stride is confirmed, i.e. when the same stride is found the next time again.
It should be noted that such a confirmation is advantageously done when the same stride reoccurs at least twice subsequent to each other.
Although the LVP and SP methods can achieve correct prediction rates of up to more than 50%, for certain cases there are still some instructions which alter the contents of memory locations according to a particular pattern which is repeated several times. Therefore, values can be predicted out of such a context and a context predictor, further referred to herein as CP has been proposed as well.
Whereas the SP is an extension of the LVP, the context predictor (CP) is based on a two-table lookup and thus consists of two tables as is illustrated in FIG. 3.
The entries in the first table 30, which is organized as n-way set associative, each comprise a tag field 14, several (e.g. four) last value fields 31a–31d, a LRU field 32 and a value history pattern field 33. An entry is selected via hashing 12 of an instruction address. If no match is found, a new entry is added to the table replacing the least recently used table entry according to the LRU field. Specifically, the step of adding a new entry comprises: writing the tag information—e.g. the instruction address—in the tag field; writing the current result produced by the instruction in one of the value fields 31a–31d, and initializing the value history pattern stored in fields 33.
The value history pattern describes the history of the last several (e.g. six) values of the selected memory location used in a series whereby each of the value fields 31a–31d is identified by a two bit pattern. ‘00’ refers to the value stored in the value field 0, ‘01’ refers to the value stored in the value field 1, etc. For example, if the six most recently used values of a certain instruction were placed in value fields 0,1,2,0,3,2 the corresponding value history pattern (VHP) is ‘00 01 10 00 11 10’. The LRU field stored in each table entry determines which value field is overwritten if a new value is detected for that instruction.
The two-table lookup is executed by using the VHP (e.g. a 12-bit pattern) as an address to select an entry in the second table, the pattern history table 34, further referred to herein as PHT. Preferably, the second PHT table may have a number of 4K entries in conjunction with the 12-bit pattern used to address this table.
An entry in the PHT table comprises four saturating 4-bit counters 35a to 35d. These counters represent each value field 31a to 31d in the first table 30. The counter with the highest value and with a count higher than a threshold value selects the appropriate last value stored in the first table. The counters in the PHT are updated according to the current value, i.e. the corresponding counter is increased by a certain number (e.g., 3) whereas the other counters are decreased by a certain number (i.e. 1). The counters saturate (e.g. by 0 resp. 12), and the threshold value (e.g., 6) is chosen to determine whether a prediction can be made or not.
The second update procedure comprises updating the VHP 33. Specifically, the VHP 33 is shifted left two bits and the vacant two bits on the right are filled with the bit pattern corresponding to the current value. If the value was not already stored in one of the ‘last value fields’, the current value replaces the least recently used last value stored in one of the four value slots and the corresponding two-bit pattern is placed into the VHP 33.
Whereas such a context predictor predicts certain repeating patterns of values—here patterns consisting of up to four different values—it is not effectively predicting strides or last values. Therefore, the best value prediction can be achieved by combining the CP with the LVP/SP. This ‘combined’ predictor is often called a hybrid predictor (HP). It uses a switching scheme to select the predictor of choice in order to achieve the best reliability.
An advantage of the hybrid predictor is that is saves latch counts for using the SP for last value and stride predictions. The major drawback, however, is the complex underlying switching scheme which is necessary in prior art to decide whether to use the LVP or the SP or the CP. According to prior art it is preferred to start the prediction with the LVP. If the LVP is not successful, but a stride could be found and confirmed, then the SP is invoked. If no stride could be determined, then the CP is initialized and starts collecting and confirming the pattern—assuming that there is a certain pattern of values.
If the pattern stabilizes, i.e., the counters in PHT 33 reach the threshold value, predictions can be made out of context. If the predictor fails, the switching scheme re-enables the LVP. The disadvantage is, however, that the context predictor invocation is rather inefficient because it takes rather long until the CP is really used because prior to issuing a context prediction the data must be collected which are the basis for a reliable CP.