1. Field of the Invention
The present invention relates to a data processing apparatus and method for implementing a replacement scheme for entries of a storage unit, and in particular to a technique for selecting a victim entry of a storage unit whose stored information is to be replaced.
2. Description of the Prior Art
Within a data processing apparatus, there will typically be various storage units that comprise multiple entries for storing information referenced by processing circuitry of the data processing apparatus when that processing circuitry is executing sequences of instructions. One example of such a storage unit would be a cache for storing instructions to be executed by the processing circuitry and/or data used by the processing circuitry when executing such instructions. The cache may be a unified cache storing both instructions and data, or may take the form of a separate instruction cache for storing instructions and a separate data cache for storing data. When an instruction needs to be fetched for subsequent execution by the processing circuitry, or data needs to be accessed by the processing circuitry (for either a write or a read operation), then a lookup operation will be performed in the relevant cache to seek to determine whether that instruction or data is present in the cache, and if so the fetch or access operation can proceed with respect to the contents of the cache. However, if the instruction or data is not in the cache, then a linefill operation is typically performed to retrieve from memory a cache line's worth of instructions or data for storing in the relevant cache, and as part of this linefill operation, the contents of an existing cache line within the cache will typically be evicted. To determine which cache line to evict, a replacement scheme will typically be employed in order to identify a victim cache line to be evicted.
As another example of a storage unit, a translation lookaside buffer (TLB) may be provided for reference by the processing circuitry when performing instruction fetch or data access operations. For example, if the load/store unit of the processing circuitry needs to access data at a specified address, it will typically reference a data TLB in order to obtain data access control information associated with that address. This access control information will be retrieved from a page table in memory, the page table containing descriptors for particular memory regions. Each descriptor contains a variety of access control information, for example access permission rights identifying whether an address in the associated memory region can be accessed by the processing circuitry in its current mode of operation, region attributes specifying for example whether the address being accessed is cacheable, bufferable, etc. Further, if virtual addresses are issued by the processing circuitry, such access control information may specify a virtual to physical address translation.
Similarly, when the fetch unit of the processing circuitry is seeking to fetch an instruction from a specified address, it may look in an instruction TLB in order to determine instruction access control information pertaining to that address, again that access control information being obtained from descriptors in memory.
For both of the above types of TLB, if a lookup in the TLB does not produce a hit, i.e. the TLB does not contain access control information for the specified address, then the access control information will be obtained from the relevant descriptor in the appropriate page table held in memory, and that access control information will be written into the TLB. As part of this process, a victim entry in the TLB will need to be identified which will have the information stored therein overwritten by the new access control information retrieved from memory, and again a replacement scheme will typically be employed to identify the victim entry.
As yet a further example of a storage unit, a branch target buffer (BTB) may be provided for access by the fetch unit of the processing circuitry when determining a next instruction to fetch from memory. In particular, if it is determined that a currently fetched instruction is a branch instruction, and that branch instruction is predicted to be taken, then the fetch unit may access the BTB in order to seek to determine the target address for that branch, so as to determine the next instruction to fetch. If the target address is not stored in the BTB for the branch instruction in question, then when the branch instruction is subsequently executed and the target address is hence determined, a victim entry in the BTB may be identified and that target address information is then stored in the victim entry of the BTB (overwriting the previous content of that victim entry). Again, a replacement scheme will typically be employed to identify the victim entry.
The various storage units provided in the data processing apparatus for reference by the processing circuitry when executing sequences of instructions may be fully associative (as an example, some TLB structures are fully associative, for example a micro-TLB is typically fully associative), and other storage units may be set associative (for example, a cache or BTB will typically be set associative). Irrespective of whether the storage unit is fully associative or set associative, a mechanism needs to be provided for selecting a victim entry whose information stored therein is to be replaced following the occurrence of a predetermined event, for example a cache miss in a cache, a TLB miss in a TLB, a BTB miss in a BTB, etc. A number of schemes exist for selecting victim entries in such situations. For example, one simple scheme is to employ a random replacement algorithm which selects an entry of the storage unit at random to be the victim entry. An alternative mechanism is to use a round-robin scheme which steps through the entries in turn when selecting victim entries.
Whilst such schemes can give satisfactory results, alternative, more complex, replacement policies have been developed which seek to make a more considered decision as to the choice of victim entry. For example, one known scheme is the “least recently used” (LRU) replacement policy scheme, which keeps a record of the usage of individual entries, and then when it is required to choose a victim entry, chooses the entry that has been least recently used by the processing circuitry. Whilst such an LRU replacement policy may provide better performance than a round-robin or random replacement policy for set or fully associative storage units, it is significantly more costly to implement.
If the processing circuitry executes multiple program threads, then additional issues arise. The multiple program threads may comprise separate applications, or may instead comprise different processes within an individual application that are allowed to execute in parallel. For example, in a chip multi-processor (CMP) system, multiple processor cores may each execute a different program thread, and the various processor cores may share access to one or more storage units, for example a level two cache. As another example, in a multi-threaded processor such as a simultaneous multi-threaded (SMT) processor, a single processor core may be arranged to execute multiple program threads, and there may be various storage units shared between the multiple program threads, for example a TLB, a BTB, one or more caches, etc.
Whilst a particular program thread is executing, it may be necessary to select a victim entry to be evicted from a particular shared storage unit, and according to the replacement scheme used, this may cause an entry belonging to a different thread to be evicted. For example, if the replacement policy is an LRU policy, then the least recently used entry, no matter which thread it belongs to, will be chosen as the victim entry. Similarly, if the policy is random, then the entry identified by a pseudo-random number generator will be chosen as the victim entry, and again this is irrespective of which thread that entry belongs to.
Often, processing circuitry that executes multiple program threads does not prioritise amongst the threads, and in such environments the above types of replacement policy may be acceptable. However, it is becoming more commonplace for one or more of the program threads to be considered to be of a higher priority than other program threads. As an example, this will typically be the case in a real-time system where a high priority, real-time, thread will be given preferential access to various resources, whilst other lower priority program threads are opportunistic in the sense that they are allowed access to the resources only when the high priority thread stalls for some reason. As an example, in a car, processing circuitry may be provided to control a variety of processes through execution of multiple program threads, and a program thread associated with the management of an ABS braking system may be considered to be a real-time, high priority, program thread.
In such multi-threaded systems, if a lower priority program thread can cause the information stored in an entry belonging to a high priority program thread to be evicted, this can be detrimental to the performance of the high priority program thread.
Accordingly, it would be desirable to provide an alternative replacement scheme for entries of a storage unit shared between multiple program threads including at least one high priority program thread and at least one lower priority program thread.