1. Field of the Invention
The invention disclosed and claimed herein generally pertains to a method associated with least recently used (LRU) logic for selecting a data set for replacement in a locking cache. More particularly, the invention pertains to a method of the above type wherein the cache may be a 4-way or a 4N-way locking cache, where N is an integer greater than 1. Even more particularly, the invention pertains to a method of the above type that provides the capability of a full 3-way (LRU) algorithm in selecting a data set for replacement, whereby the method keeps track of the strict order in which three sets have been accessed.
2. Description of the Related Art
As is generally well known in the art, caches are small, fast storage buffers employable to store information, such as instruction code or data, in order for a processing device to more quickly and efficiently have access to the information. Typically, it is faster for the processing device to read the smaller memory of the cache than to read a main memory. However, a cache is a limited resource. Accordingly, it has become common to employ LRU logic to select a data set in the cache for replacement, when space must be made in the cache to accommodate new data. The selected data set is then replaced with the new data. The LRU logic is generally used to keep track of the order in which respective data sets, or lines, within a cache congruence class are accessed. The LRU logic provides information indicating the least recently accessed set when a cache miss occurs, to indicate that new data must be entered. The cache is then operated to replace such least recently accessed set with the new data. This procedure is based on the assumption that the least recently accessed data set is also least likely to be needed in the future.
The term “cache congruence class”, as used herein, refers to a group of data sets in a cache that are collectively represented by a particular LRU logic configuration. For each cache congruence class, there is LRU data stored either in the cache directory or in a separate array. A full LRU keeps track of the strict order in which all sets have been accessed. For example, in a 4-way set associative cache, that is, a cache congruence class having four data sets, a full LRU might encode the following order:                ->Order of Use->LRU-> set D, set B, set C, set A <-MRUThis shows set D as being the least recently used and set A as being the most recently used (MRU). If set B were then accessed (due to a load or store), the LRU data would be updated to:        ->Order of Use->LRU-> set D, set C, set A, set B <-MRUIf a cache miss then occurs, set D, being the least recently used, will be replaced. The LRU data will then be updated to:        ->Order of Use->LRU-> set C, set A, set B, set D <-MRU        
It is to be understood that for a full LRU procedure, after a given set (e.g., set D)becomes the MRU set, there must be at least three consecutive accesses to other sets (e.g., set A, B, and C) before set D becomes the LRU set and is subject to replacement. However, notwithstanding its advantages, a full LRU is expensive to implement. For example, at least five LRU bits are required to keep track of the least recently used set in a congruence class for a 4-way associative cache. Accordingly, a commonly used alternative is a binary tree LRU, which uses only N-1 bits, or three bits for a 4-way set associative cache. Each bit is located at a decision node of a binary tree whose leaves represent the N sets. The operation of a 4-way binary tree is described hereinafter in further detail.
In comparing the performance of full LRU and binary tree LRU algorithms, it is useful to consider the metrics MMBR (minimum misses before replaceable) and MHBR (minimum hits before replaceable). Following an access to any given set X, MMBR is the minimum number of consecutive misses which must occur before a subsequent miss will result in X being chosen for replacement. Following an access to any given set X, MHBR is the minimum number of consecutive hits to other sets which must occur before a subsequent miss will result in X being chosen for replacement. Each of these metrics provides a “worst case performance”, that is, a measure of how quickly a data set, once accessed, can be forced out of the cache due to subsequent accesses.
As is further known in the art, a locking cache allows software (e.g., the operating system) to reserve certain sets of the cache for a particular application. When the given application is running, only those sets which are unlocked can be replaced by data needed by the application. Other sets cannot be replaced. This is useful for “pinning” an application in the cache, so that it is not forced out by accesses made by other applications. Also, the locking cache provides isolation for streaming applications, which load and then discard data.
In one example of a locking cache, a 4-way cache comprises sets A, B, C and D. Set A is reserved for a high-bandwidth streaming application called P, and sets B, C and D are reserved for all other applications, including an application Q. This creates a virtual 3-way cache for the majority of applications. However, as described hereinafter in further detail, under certain conditions, one of the sets B, C and D of the virtual 3-way cache would be replaced after only one access to another set. That is, both MMBR and MHBR for the virtual cache, indicating worst-case performance, would be only one. It would be very desirable to increase both the MMBR and MHBR for such 3-way cache arrangement, from one to at least two.