As is known in the art, a computer system may include a central processing unit that operates on instructions and data received from a coupled memory device. As data and instructions are required by the central processing unit, they are transferred from the memory to the central processing unit.
The latency inherent in obtaining data from the memory device may be quite large. Accordingly, smaller, faster memories, referred to as caches, are typically placed between the memory and the central processing unit. Caches provide temporary storage of portions of memory that are required by the central processing unit. Because the caches are faster than the memory and located relatively closer to the central processing unit, references to the cache are serviced faster than references to the memory. Accordingly, the use of caches increases the overall performance of the computer system by reducing the latency associated with accessing memory data and instructions.
Because the cache is smaller than the memory and stores only a subset of the memory data, when the central processing unit requests an item of memory data it may occur that the item of memory data is not located in the cache. When a request is made for data that is not in the cache, the request is said to `miss` in the cache. When there is a miss in the cache, the item of memory data is retrieved from the memory and stored in the cache. Thus, the full latency associated with obtaining the item of memory data is incurred for requests that miss in the cache. It is evident therefore that it is desirable to minimize the number of cache misses that occur during operation.
There are a variety of cache architectures, some of which are designed to minimize the number of cache misses that occur during operation. In general, in all of the architectures, the cache is generally apportioned into a number of blocks, where each block comprises a fixed number of bytes of data. Each block of data is a subset of a page of physical memory. Data and instructions associated with processes are allocated pages of physical memory as the processes are introduced into the system.
In a direct mapped cache architecture, when a block of data is copied to the cache, and for subsequent references, selected bits of the physical address of the page at which the block resides are used to form an index to the cache. Because the index is only a part of the physical address, many blocks may map to the same index in the cache. Because multiple blocks may map to a common cache index, a problem referred to as thrashing may occur. Thrashing occurs when two processes or instructions executing on the central processing unit repeatedly access blocks that map to the same cache index. Each time one instruction requests a block of data and it is not located at the associated cache index a miss occurs and the requested block of data is retrieved from memory. Thus, the cache may thrash between two different blocks of data at a given cache index, incurring the full memory latency each time the given cache index is accessed. When excessive thrashing occurs, the advantages of having a cache are eliminated.
A set associative architecture helps to minimize the amount of thrashing that occurs in the cache by providing more than one cache location for each cache index. In a set associative cache, the cache is apportioned into a number of sets of data blocks, and a cache index may map to an entry in any one of the sets of data blocks of the cache. Set associative caches can therefore reduce the amount of thrashing by allowing different instructions that map to the same cache index to be mapped to different sets within the cache. Although set-associative architectures help to alleviate thrashing by providing more than one cache location for each cache index, thrashing may still occur when more data blocks that map to the same cache index are in use than there are sets available in the cache.
When thrashing occurs consistently in an executing process, the overall performance of the system may be reduced to below that of a system without a cache memory. Therefore, it is desirable to provide a method and apparatus that would minimize thrashing in cache memory to allow the potential performance of the computer system to be realized.