1. Field of the Invention
The invention relates to processor caches and more particularly to determining which data in a cache to overwrite or write back in the event of a cache miss.
2. Discussion of Related Art
Processors have attained widespread use throughout many industries. A goal of any processor is to process information quickly. One technique which is used to increase the speed with which the processor processes information is to provide the processor with an architecture which includes a fast local memory called a cache. Another technique which is used to increase the speed with which the processor processes information is to provide a processor architecture with multiple processing units.
A cache is used by the processor to temporarily store instructions and data. A cache which stores both instructions and data is referred to as a unified cache; a cache which stores only instructions is an instruction cache, and a cache which stores only data is a data cache. Providing a processor architecture with either a unified cache or an instruction cache and a data cache is a matter of design choice.
A factor in the performance of the processor is the probability that a processor-requested data item is already in the cache. When a processor attempts to access an item of information, it is either present in the cache or not. If present, a cache xe2x80x9chitxe2x80x9d occurs. If the item is not in the cache when requested by the processor, a cache xe2x80x9cmissxe2x80x9d occurs. It is desirable when designing a cache system to achieve a high cache hit rate, or xe2x80x9chit ratioxe2x80x9d.
After a cache miss occurs, the information requested by the processor must then be retrieved from memory and brought into the cache so that it may be accessed by the processor. A search for an item of information that is not stored in the cache after a cache miss usually results in an expensive and time-consuming effort to retrieve the item of information from the main memory of the system. To maximize the number of cache hits, data that is likely to be referenced in the near future operation of the processor is stored in the cache. Two common strategies for maximizing cache hits are storing the most recently referenced data and storing the most commonly referenced data.
In most existing systems, a cache is subdivided into sets of cache line slots. When each set contains only one line, then each main memory line can only be stored in one specific line slot in the cache. This is called direct mapping. In contrast, each set in most modern processors contains a number of lines. Because each set contains several lines, a main memory line mapped to a given set may be stored in any of the lines, or xe2x80x9cwaysxe2x80x9d, in the set.
When a cache miss occurs, the line of memory containing the missing item is loaded into the cache, replacing another cache line. This process is called cache replacement. In a direct mapping system, each line from main memory is restricted to be placed in a single line slot in the cache. This direct mapping approach simplifies the cache replacement process, but tends to limit the hit ratio due to the lack of flexibility with line mapping. In contrast, flexibility of line mapping, and therefore a higher hit ratio, can be achieved by increasing the level of associativity. Increased associativity means that the number of lines per set is increased so that each line in main memory can be placed in any of the line slots (xe2x80x9cwaysxe2x80x9d) within the set. During cache replacement, one of the lines in the set must be replaced. The method for deciding which line in the set is to be replaced after a cache miss is called a cache replacement policy.
Several conventional cache replacement policies for selecting a datum in the cache to overwrite include Random, Least-Recently Used (LRU), Pseudo-LRU, and Not-Most-Recently-Used (NMRU). Random is the simplest cache replacement policy to implement, since the line to be replaced in the set is chosen at random. The LRU method is more complex, as it requires a logic circuit to keep track of actual access of each line in the set by the processor. According to the LRU algorithm, if a line has not been accessed recently, chances are that it will not be accessed any more, and therefore it is a good candidate for replacement. Another replacement policy, NMRU, keeps track of the most recently accessed line. This most recently accessed line is not chosen for replacement, since the principle of spatial locality says that there is a high probability that, once an information item is accessed, other nearby items in the same line will be accessed in the near future. The NMRU method requires a logic circuit to keep track of the most recently accessed line within a set. In all cache replacement policies, the line selected for replacement may be referred to as a xe2x80x9ccandidatexe2x80x9d.
Once a candidate is selected, further processing must occur in the cache in order to ensure the preservation of memory coherency. If the contents of the candidate has been altered in the cache since it was retrieved from memory, then the candidate is xe2x80x9cdirtyxe2x80x9d and a memory incoherency exists. Before the contents of the dirty candidate can be replaced with the new information requested by the processor, the current contents of the dirty candidate must be updated to memory. This operation is called a xe2x80x9cwrite backxe2x80x9d operation. While the implementation of such a scheme allows reduced bus traffic because multiple changes to a cache line need be loaded into memory only when the cache line is about to be replaced, a drawback to the write back operation is delay. That is, access to the cache is slowed or even halted during a write back operation.
A method selects a candidate to mark as overwritable in the event of a cache miss while attempting to avoid a write back operation. The selection includes associating a set of data with the cache access request, each datum of the set is associated with a way, then choosing an invalid way among the set. Where no invalid ways exist among the set, the processor next determines a way that is not most recently used among the set. Next, the method determines whether a shared resource is crowded. When the shared resource is not crowded, the not most recently used way is chosen as the candidate. Where the shared resource is crowded, the method next determines whether the not most recently used way differs from an associated source in the memory and where the not most recently used way is the same as an associated source in the memory, the not most recently used way is chosen as the candidate.
In one embodiment, the method for selecting a candidate to mark as overwritable in the event of a cache miss includes receiving the cache access requests, where the request is associated with a main memory address, determining whether the contents of the main memory address are present in a data cache, when the access xe2x80x9cmissesxe2x80x9d, associating the main memory address with a set within the data cache, determining whether any way in the set is invalid, and choosing an invalid way, if one exists, as the candidate. If no invalid way exists, a cache replacement policy is applied to select a way as a preliminary candidate. The cache replacement policy is based on the state of at least one affected resource, such as the xe2x80x9ccrowdedxe2x80x9d state of a write back buffer, the state of a cross bar switch, or the state of a memory controller buffer.
When the shared resource is a write back buffer, when the not most recently used way differs from an associated source in the memory, the candidate is chosen as the way among the set that does not differ from an associated source in the memory. Where all ways among the set differ from respective sources in the memory, the not most recently used way is chosen as the candidate and the not most recently used way is stored in the write back buffer.
When the back buffer is crowded, one embodiment determines whether the preliminary candidate is dirty. Where the write back buffer is not crowded or the write back buffer is crowded and the preliminary candidate is not dirty, the preliminary candidate is chosen as the candidate. Where, however, the write back buffer is crowded and the preliminary candidate is dirty, then the contents of the preliminary candidate are stored in the write back buffer and the preliminary candidate is chosen s the candidate.
In one embodiment, all or some of the method described above is implemented in a computer system having a data cache that is shared by a plurality of processing units.
The present invention will be more fully understood in light of the following detailed description taken together with the accompanying drawings.