The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also correspond to embodiments of the claimed subject matter.
Least Recently Used (LRU) algorithms discard the least recently used items first, such as the least recently used element in a computing cache. LRU algorithms must therefore keep track of what element was used when. Such tracking can be computationally expensive and space intensive on an implementing circuit if the cache replacement scheme is to operate in accordance with a true LRU algorithm which requires that so called “least recently used” elements are always discarded first. Some LRU compliant implementations utilize “age bits” for cache-lines and track the “Least Recently Used” cache-line based on the age-bits. For example, each time a cache-line is used, the age of all other cache-lines changes.
True LRU compliant implementation may become infeasible due to computing expense and space constraints. Pseudo-LRU (PLRU) may instead be utilized for caches with large associativity. For example, where a scheme that “almost always discards” one of the least recently used items is sufficient, then a PLRU algorithm may be desirable as design constraints may be adhered to with less costly computational expense and lessened circuit space requirements.
Pseudo-LRU generally refers to one of two cache replacement algorithms: tree-PLRU and bit-PLRU.
Tree-PLRU is an efficient algorithm to find an item that most likely has not been accessed very recently, given a set of items and a sequence of access events to the items. A tree-PLRU algorithm operates with the use of a binary search tree for the items in question. Each node of the binary search tree has a one-bit flag indicating “go left” or “go right” to find the searched for pseudo-LRU element. Traversing the binary search tree according to the values of the flags eventually yields the searched for element.
Bit-PLRU stores one status bit for each cache line, sometimes referred to as most recently used bits (MRU-bits). Every access to a line sets its MRU-bit to 1, indicating that the line was recently used. Whenever the last remaining 0 bit of a set of status bits is set to 1, all other bits are reset to 0. Replacement policy due to cache misses then targets the line with the lowest index whose MRU-bit is 0 for replacement (e.g., the pseudo LRU element is discarded and the location is available for a new cache element).
Conventional “power of two” number of ways Pseudo-LRU implementations enjoy well known algorithms and implementations which operate efficiently at near real LRU or fully compliant LRU performance levels. Such Pseudo-LRUs require only a single bit for any node of decision. Such PLRU implementations therefore require a total of n−1 number of bits, in which n is the number of ways. A tree is built with the n−1 bits, resulting in a very balanced tree with performance close to LRU or real LRU tree. Nodes are placed into two groups, and one bit is used to decide which was more recent, then each sub-group into is placed into two groups, and one bit is again used to decide which is more recent, and so on, resulting in a structure having exactly n−1 bits for this power of two tree structure.
Unfortunately, such a structure does not and cannot support a number of ways that is different than the power of two because the nodes cannot be split evenly, and thus, a single bit cannot support a non-even split.
Notwithstanding the existence of conventional power-of-two PLRU implementation models, the optimum point for performance-to-power and performance-to-area on a circuit is not necessarily a power of two. For example, the optimum point may be an odd number in violation to the power-of-two available models or may be a multiple of an odd number which is not a power-of-two. Presently envisioned for new products are 6 way and 12 way caching models, neither of which conforms to a power-of-two model as presently available to industry.
The present state of the art may therefore benefit from systems and methods for implementing a balanced P-LRU tree for a “multiple of 3” number of ways cache as described herein. A “multiple of 5” number of ways cache implementation is additionally described as are variations of the “multiple of 3” number of cache ways.