The present invention relates in general to data processing systems, and in particular, to cache set ordering according to the recency of use.
In order to reduce penalties in system performance due to accesses to and from relatively slow system memory, modem data processing systems employ memory caches constructed from high speed memory cells as an intermediate memory store between a central processing unit (CPU), and system memory. Data and instructions are loaded from system memory into cache and then fetched from cache by the CPU.
The CPU first looks to the cache for data and instructions. If the instructions or data required by the CPU are not in a cache, a so-called xe2x80x9ccache missxe2x80x9d has occurred. Then, the CPU loads the data or instructions from memory into the cache. In order to provide space in the cache to store the incoming data or instructions, one or more cache lines needs to be moved from the cache, or xe2x80x9ccast out,xe2x80x9d to system memory. To facilitate selection of a cache line for casting out, a history of use, that is, access to, each line in a predetermined class of lines may be encoded, and maintained in a history array. A cast out strategy may then use the history to select the lines to be cast out. If a class of cache line sets includes four sets, there are twenty-four possible permutations of accesses to the lines constituting the class. Typically, eight bits are used to encode the use history via a 32-to-5 encoder. Likewise, a 5-to-32 bit decoder is used to determine a set to be selected for the cast out.
Additionally, a prefetch strategy may be based on a most recently used (MRU) approach. Data paths in the cache memory allow only one set to be accessed at a time. However, a fast decode of the MRU set would permit the MRU set, in a level two (L2) cache to be speculatively brought into the level 1 (L1) cache in the same cycle as cache tags are read.
The history encoding and decoding operations represent an overhead in cache memory accesses. With increasing CPU speed, there is a need in the art for a reduction in the overhead represented by the implementation of a cache cast out strategy, as well as a speculative loads from an L2 cache to L1 cache. Thus, there is a need in the art, for apparatus and methods for faster encoding and decoding of cache set use histories.
The aforementioned needs are addressed by the present invention. Accordingly there is provided, in a first form, a method of encoding a use history in which a least recently used (LRU) set is encoded with a first preselected bit pair. The method also encodes a most recently used (MRU) set with a second preselected bit pair, and encodes a next least recently used (NLRU)set and a next most recently used (NMRU) set with a preselected single bit.
There is also provided, in a second form, a data processing system. The data processing system includes a cache memory including a plurality of cache line sets; and circuitry operable for generating a cache set use history encoding. Additionally, circuitry is coupled to the cache memory operable for decoding the encoding. The encoding comprises no more than five bits, the encoding being operable for recovering a complete use history.
Additionally, there is provided in a third form, a method of cache set history generation. The method includes the steps of decoding a next least recently used (NLRU) set and a next most recently used set (NMRU) in a previous use history in response to first and second bit pairs and a single bit encoding the previous history, and decoding a most recently used set in the previous history in response to the second bit pair encoding the previous history. The decoded sets are used to generate a current history by encoding a first bit pair in the current history in response to the previous NLRU, encoding a second bit pair in the current history in response to a cache hit and encoding a single bit in the current history in response to the NMRU and the MRU in the previous history.
There is additionally provided, in a fourth form, a data processing system containing a cache memory including a plurality of cache line sets, circuitry operable for generating a cache set use history encoding, and circuitry coupled to the cache memory operable for decoding the encoding. The decoding circuitry includes circuitry operable for forming first, second, third and fourth intermediate signals in response to first and second bit pairs in the encoding. Also included is circuitry operable for forming a first logical combination of the first and second intermediate signals and a single bit in the encoding, circuitry operable for forming a second logical combination of the first and second intermediate signals and the single bit, and circuitry operable for forming a third logical combination of the third and fourth intermediate signals. A first decoded history signal is formed by circuitry operable for decoding the first and third logical combinations and a second decoded history signal is formed by circuitry operable for decoding the second and third logical combinations.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.