Since the beginning of computing, processors have been faster than memories. Even though memory technology has evolved and improved over time, so has processor technology. What this means is that processors often have to remain idle for substantial amounts of time while waiting for the memory to respond to a request for data. As a result, system performance may be negatively impacted.
Computer systems have evolved to include memory hierarchies comprising various types of long term storage, main memory, and one or more levels of cache. However, as one moves down the down the memory hierarchy from caches to long term storage, device access times increase dramatically. An ideal solution is to have enough cache memory or fast main memory available to service the currently executing program. But in most systems, such memory is present in only limited amounts or the program demands more memory than is available.
Caches generally function by keeping often used or recently used data close to or within the processor. The idea is that by storing recently used data in close proximity to the processor, the next time a memory request is made for that particular data, a long memory access to main memory or the hard disk drive is not necessary. When a computer starts up, the cache is empty. But over time, the cache continues to fill up until there are no longer any empty entries for new data. This is not a problem as long as invalid entries are available for replacement. But if all existing entries are valid, the cache replacement logic may delete valid entries to make room for incoming data.
Normally, caches work very well, as many programs exhibit temporal and spatial locality and may thus re-use cached data multiple times. However, some programs have a memory access pattern with little data re-use, which makes caches less effective (or even ineffective). Examples are streaming media programs (audio, video, etc) which access data one time and in sequence, and programs which iterate over large (relative to cache size) data sets. When multiple programs are sharing a cache, a program having poor data re-use may compete with another program having good data re-use, causing useful data to be pushed out of the cache. Thus the useful data may be brought in and then pushed out, again and again. This is an example of what is called cache thrashing and may have a detrimental effect on overall system performance.
In light of the foregoing and other deficiencies, it is known in the prior art to improve cache performance utilizing a variety of techniques, including: (1) profile-based compilation, (2) increasing the associatively of the cache, or (3) using stream buffers (FIFOs) to speculatively store sequential data. Profile-based compilation is a static optimization based on results obtained for a certain system configuration and mix of programs running on that system, and may not be effective in a different system configuration or under a different mix of programs. Increasing the associatively of the cache allows it to store data more flexibly, but in some cases this only delays cache thrashing and does not solve the fundamental problem. It also adds costs and complexity, which may not be a practical solution. A stream buffer is not effective if the data is not sequential (because stream buffers do not provide random access), and it occupies space in the design whether it is used or not.
Therefore, there is a need for improved systems and methods which address these and other shortcomings of the prior art.