1. Field of the Invention
The present invention is directed to an improved data processing system and in particular, to a computer implemented method and apparatus for data caching. Still more particularly, the present invention is directed to a computer implemented method, apparatus, and computer usable program code for software-assisted data cache and prefetch control.
2. Description of the Related Art
A cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data require more extensive fetch and read times. Caching is a technique of temporarily storing frequently accessed data in random access memory (RAM), the cache. In modern computer systems, multiple layers of cache may be used, each of which has a smaller storage size than the main memory.
Caching allows the use of a faster but smaller memory type to accelerate a slower but larger memory type, or in a special area of a hard disk drive, to reduce the time required to read and write data. Once the data are stored in the cache, future use can be made by accessing the cached copy rather than re-fetching or re-computing the original data, so that the average access time is lower.
The hierarchical arrangement of memory helps bridge a widening gap between processor speeds and memory access rates. Despite large caches, memory access latencies still cause significant performance losses in many applications. The average data load latency for technical computing applications is large because caches are often ineffective for these programs. It is very common for computer program data structures to be too large to completely fit inside an L1 cache. It is also very common for a computer program to access more than one of these large data structures in a program causing competition for the resources of the L1 cache. While many programs can obtain acceptable performance by simply letting the processor manage its own caching, data prefetching may dramatically improve performance.
Hardware-based prefetch methods typically rely on the prediction of addresses to prefetch and thus are limited by prediction accuracy. These methods also may require memory to hold tables or histories of memory references. A hardware prefetch engine only can identify a limited number of streams to prefetch. Also, different behaviors are present with data access. Some data accesses show high spatial or temporal data locality and some data accesses have no reuse. An example of this type of data access is one that is performed in a stride fashion. Stride refers to the number of memory array elements which gets stepped through as an operation repeats. Spatial locality means that if a memory location is accessed, then most likely a location near this location will be accessed in the near future. Temporal locality means if a memory location is accessed, then most likely this memory location will be accessed again in the near future.
As an alternative, software-based prefetch methods are usually able to determine which address to prefetch but are hampered by the overhead from additional prefetch instructions inserted in an application.