1. Field of the Invention
The present invention relates generally to the data processing field and more specifically to a computer implemented method, system and computer usable program code for providing optimal cache management.
2. Background Description
It is anticipated that cache performance, particularly the cache miss rate, will play a much greater role in determining the performance of multi-core or chip multi-processors than it currently does on single-core systems. Reasons for this include the limited memory bandwidth and longer memory latency relative to CPU speed in today's machines. On known multi-processor systems, the available memory bandwidth usually increases because each processor adds its own connections. On chip multi-processors, all the CPUs share the same connection. A recent report has shown that while a single thread on an Intel Core 2 Quad Q6600 machine sustained a 5.8 GB/s memory data transfer rate, using four threads would achieve only 5.3 GB/s in total memory transfer.
Not only is the memory bandwidth inadequate—each core on the Intel Core 2Quad Q6600 is capable of 19 billion 64-bit floating point operations a second—the same bandwidth is shared by all cores. If one thread has a high miss rate, therefore, it may saturate the memory bus and render other cores useless.
Unlike the problem of memory latency, bandwidth limitations cannot be alleviated by data prefetching or multi-threading. The primary solution is to reduce the amount of memory transfer by reducing the miss rate of a program. The problem of optimal caching is NP-hard if computation and data reorganization are considered. If the problem is limited by assuming that the computation order and the data layout are fixed, the best caching is given by the optimal replacement strategy “MIN”. The MIN procedure, however, requires an arbitrary look ahead and, as a result, cannot be implemented efficiently in hardware. Accordingly, today's machines frequently use the well-known “LRU” (least recently used) replacement strategy. It is known, however, that LRU replacement can be worse than MIN by a factor proportional to the cache size.
Recent architecture designs have added an interface for a compiler, when generating machine code, to influence hardware cache management during execution. Techniques include using available cache-hint instructions to specify which level of cache to load a block into, and using an evict-me bit which, if set, informs the hardware to replace the block in cache first when space is needed. These two techniques are based on the observation that a program has multiple working sets—some are larger than cache and some are smaller. The goal of both methods is to keep the large working sets out in order to hold the small working sets in cache and undisturbed.
There is, accordingly, a need for a cache management mechanism that can be efficiently implemented and, at the same time, provide an optimal replacement strategy.