In general, the improvement in the processing speed of a memory is very slow compared to the improvement in the processing speed of a central processing unit (CPU). Thus, there is a speed gap between a CPU and a lower memory. In order to overcome the difference, most systems have adopted a cache memory structure.
In a structure using a cache memory, upon accessing data, if the content requested by the CPU is present in the cache memory, the data accessing is successfully carried out without any problems. However, if the content requested by the CPU is not present in the cache memory, a cache miss occurs.
When the cache miss occurs, in order to make the content present in the cache memory, a cache block should be replaced by fetching a block from a lower memory and storing the block in the cache memory.
The algorithms for the cache block replacement include LRU (Least Recently Used), Random, and Pseudo LRU algorithm, etc., which were designed to improve the system performance by replacing a cache block which is less likely to be reused.
Among them, the LRU algorithm replaces the block which has been the least recently referenced, among a plurality of replaceable cache blocks in the same set. The LRU algorithm was invented based on the idea that the block which has been the least recently referenced, or not recently referenced, is less likely to be reused in the future.
In these conventional LRU algorithms, when a cache block is referenced by the CPU, the block becomes the block which has been the most recently referenced in the set, or the MRU (Most Recently Used) block, and in case of a 2-way set-associative cache, the other block becomes the block which has been the least recently referenced, or the LRU (Least Recently Used) block. Therefore, the LRU block information can be stored in the LRU bit in the tag array. For example, if the LRU block is a block stored in way 0, 0 is stored in the LRU bit, and if the LRU block is a block stored in way 1, 1 is stored in the LRU bit. In the case that a cache miss occurs and a block needs to be fetched from the lower memory and stored in the cache, in order to store the newly fetched block, by referring to the LRU bit in the cache memory, it is determined which block stored in the cache memory needs to be replaced. If 0 is stored in the LRU bit, a cache block stored in way 0 is selected as the block to be replaced, and if 1 is stored in the LRU bit, a cache block stored in way 1 is selected as the block to be replaced.
However, in these conventional cache block replacement algorithms, even when the block to be replaced has been modified since it had been stored in the cache memory, the need for storing the replaced block in the lower memory through the write buffer is not considered, and the block to be replaced is determined only based on the information as to which block has been accessed the recently.
Also, in the conventional cache block replacement algorithms, when a failure in referring to a cache occurs and a cache block needs to be replaced, a time delay may occur as the block to be replaced may be a dirty block and should be stored in the write buffer, but the write buffer is full so the system should wait until some space becomes available in the write buffer
The problems of the conventional cache block replacement algorithms may more frequently occur in the latest systems using a plurality of CPUs, due to the increase in the CPU operation speed, the broader gap in the operation speed between the CPU and the memory, and the cache misses occur more intensely during a shorter time period when dealing with the data having burst characteristics.