1. Field of the Invention
The present invention relates to a data processing apparatus and method for generating prediction data used by processing circuitry when performing processing operations, and in particular to techniques for generating such prediction data when the processing operations performed by the processing circuitry include both high priority operations and low priority operations.
2. Description of the Prior Art
Modern data processing systems rely on prediction mechanisms that generate prediction data used by processing circuitry when performing processing operations, such prediction mechanisms often seeking to keep some historical record of previous behaviour for use when generating such prediction data. In particular, many such prediction mechanisms maintain a history storage having a plurality of counter entries for storing count values. On occurrence of an event causing a prediction to be made, one or more of the counter entries are accessed dependent on the event, and prediction data is then derived from the contents of one or more of those accessed counter entries. Such prediction mechanisms can be used in a variety of situations. For example, when branch instructions are executed by the processing circuitry, branch prediction circuitry is typically used to predict whether the branch will be taken or not taken, and the branch prediction circuitry will typically use a history storage to keep a summary of previous branches outcomes for reference when deciding whether to predict a particular branch instruction as taken or not taken. Similarly, prediction circuitry may be provided in association with a cache to predict whether a cache access is likely to result in a cache miss (i.e. a situation where the data seeking to be accessed in the cache is not present within the cache). Furthermore, in some situations, prediction circuitry may be used to predict the result of a processing operation before that processing operation is performed with the aim of increasing processing speed in the event that the prediction is accurate.
Such prediction circuitry can also be used in other areas, for example as part of an eviction mechanism when seeking to determine a storage element of a storage structure whose contents should be evicted from the storage structure, such storage structures being for example a cache, a translation lookaside buffer (TLB) for storing access control information for different memory regions, a branch target buffer (BTB) for strong target addresses of branch instructions, etc.
Whilst correct predictions generally improve performance and hence power efficiency, there is an overhead in maintaining summaries of past behaviour for use by such prediction mechanisms. This overhead is further increased when the various processing operations performed by the processing circuitry are considered to form multiple different categories, each of which requires predictions to be made in respect of them. For example, the processing circuitry may have some operations that are considered high priority operations having a higher priority than other operations that are considered lower priority operations. For example, in a multi-threaded processor core, at least one of the program threads may be considered to be a high priority program thread, whilst at least one other program thread may be considered to be a low priority program thread. Similarly, certain types of processing operation, whether in a multi-threaded processor core or in a single threaded core, may be considered to be higher priority operations than other operations. As an example, considering branch prediction circuitry, direct branch instructions (i.e. instructions where the target address for the branch instruction is specified directly within the instruction) may be considered to be higher priority than indirect branch instructions (i.e. branch instructions where the target address for the branch instruction is not specified directly in the instruction itself and instead is specified elsewhere, for example by the contents of a register identified by the indirect branch instruction). It may be more important for the branch prediction made in respect of direct branch instructions to be more accurate than the branch prediction made in respect of indirect branch instructions, since direct branch instructions often occur more frequently than indirect branch instructions.
To maintain high prediction accuracy for both high priority operations and low priority operations, separate history storage structures could be kept for the different priority operations. However, this would give rise to a significant hardware cost, which in many situations will be considered unacceptable.
As an alternative, the history storage structure maintained by the prediction circuitry may be shared and used to produce prediction data for both high priority operations and low priority operations. However, when such history storage structures are shared, this can give rise to collision, which in many situations can be destructive and significantly reduce prediction accuracy. For example, if a particular counter entry is generally updated in one direction by one type of operation and generally updated in the opposite direction by another type of operation, this may corrupt the prediction data, leading to inaccurate predictions for either of those operations.
Indeed, this problem is discussed in some detail in the article “Branch Prediction and Simultaneous Multithreading” by Sébastien Hily et al, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques. In that paper the performance of different branch predictors is analysed, including the collisions in the prediction tables that can occur when used in a simultaneous multithreaded (SMT) processor core. In that paper it is concluded that if the sizes of the tables (for example a history storage table) are kept small, there is a significant increase in mispredictions.
Accordingly, it would be desirable to provide an improved technique for generating prediction data used by processing circuitry when performing processing operations, in situations where the processing circuitry performs both high priority operations and low priority operations and the history storage is shared for both types of operation.