This invention relates generally to computer systems and more specifically to cache memory systems.
Most computer systems employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit, or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches. Many computer systems employ multiple processors, each of which may have multiple levels of caches. Some caches may be shared by multiple processors. All processors and caches may share a common main memory.
Typically, a memory is organized into words (for example, 32 bits or 64 bits per word). Typically, the minimum amount of memory that can be transferred between a cache and a next lower level of the memory hierarchy is called a line, or sometimes a block. A line is typically multiple words (for example, 16 words per line). Memory may also be divided into pages (also called segments), with many lines per page. In some systems, page size may be variable. The present patent document uses the term xe2x80x9clinexe2x80x9d for a cache entry, but the invention is equally applicable to blocks or other memory organizations.
Many computer systems employ multiple processors, each of which may have multiple levels of caches. Some caches may be shared by multiple processors. All processors and caches may share a common main memory. A particular line may simultaneously exist in memory and in the cache hierarchies for multiple processors. All copies of a line in the caches must be identical, a property called coherency. The protocols for maintaining coherence for multiple processors are called cache coherence protocols.
To improve performance, the computer system tries to keep data that will be used soon in the fastest memory, which is usually a cache high in the hierarchy. Typically, when a processor requests a line, if the line is not in a cache for the processor (cache miss), then the line is copied from main memory, or from a cache of another processor. A line from main memory, or a line from another processor""s cache, is also typically copied into a cache for the requesting processor, assuming that the line will need to be accessed again soon. If a cache is full, then a new line must replace some existing line in the cache. If a line to be replaced is clean (the copy in cache is identical to the copy in main memory), it may be simply overwritten. If a line to be replaced is dirty (the copy in cache is different than the copy in main memory), then the line must be evicted (copied to main memory). A replacement algorithm is used to determine which line in the cache is replaced. A common replacement algorithm is to replace the least-recently-used line in the cache.
One particular performance concern for large multiple processor systems is the impact on latency when one processor requests a line that is cached by another processor. If a modified (dirty) line is cached by a first processor, and the line is requested by a second processor, the line is written to main memory, and is also transferred to the requesting cache (called a cache-to-cache transfer). For a large multiple-processor system, a cache-to-cache transfer may require a longer latency than a transfer from main memory. In addition, for a large multiple-processor system, a cache-to-cache transfer may generate traffic on local buses that would not be required for a transfer from main memory. Accordingly, there is a need to improve performance by reducing the average latency, which can be improved by reducing the number of cache-to-cache transfers.
Performance is particularly affected by thrashing, where a line is transferred from a first cache to a second cache, and then subsequently requested by the first cache (or some other cache). There is a need to improve performance by reducing thrashing.
Still another performance concern is the predictability of a system response to real-time events. Typically, the time required to evict a dirty line from a cache is longer than the time required to evict a clean line, because a dirty line must be written externally to the cache, whereas a clean line may only need a state change. In the case of real-time event, such as a processor interrupt, multiple lines may need to be evicted from a cache. The overall time required to write new lines into the cache and to evict lines from the cache depends on the number of evicted lines that are dirty. There is a need to be able to bound the uncertainty in the time required to respond to a real-time event.
Systems for determining the age of dirty lines are known. For example, U.S. Pat. No. 6,134,634 describes a system in which each line in a cache has an associated counter that is used to count cycles during which the line has not been written. If the count exceeds a predetermined number, the line is determined to be stale and may be evicted. There is a need for a lower cost system for preemptive eviction.
Cache-to-cache transfers, and the timing uncertainty associated with evictions during the execution of real time code, are both reduced by limiting the number of dirty entries in a cache. Cache-to-cache transfers are further reduced by limiting the number of entries in a cache that might be subject to a cache-to-cache transfer. In a first example embodiment of the invention, a cache system counts the total number of dirty entries in the cache. In a variation, the cache system counts the total number of entries in the cache that result from a cache-to-cache transfer. In each embodiment, when the number exceeds a predetermined threshold, at least one entry is preemptively evicted. For each embodiment, the threshold may optionally be dynamically optimized by an operating system.