1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method to efficiently prefetch and batch compiler-assisted software cache accesses.
2. Description of Related Art
Shared memory multiprocessor systems are typically composed of a plurality of processors and one or more memories, e.g., a global memory or memories, which are linked by an interconnection bus or network. In such shared memory multiprocessor systems, because memory accesses must go through this interconnection bus or network, memory access latency is introduced into the system. Such memory access latency becomes important to the performance of the multiprocessor system.
Various approaches have been attempted to minimize this access latency. Such approaches generally involve multithreading techniques and caching techniques.
With particular importance to the present invention, when using caching in a multiprocessor system, the need to maintain cache coherence is an important consideration. That is, in order to avoid changing the semantics of a program execution through the use of caches, the memory must retain the appearance of sequential consistency. Most approaches to this cache coherence problem have focused on hardware mechanisms to maintain coherence. However, the overhead of maintaining coherence in hardware can be high and scaling systems based on hardware coherence can be a difficult problem.
An alternative to hardware-based solutions for coherence is to use compilers to analyze programs and automatically augment them with calls to coherence operations, e.g., updates and invalidates, where necessary. Compiler based coherence techniques require only minimal support, or even no support, from cache hardware. The hardware need only provide a mechanism to enable software control of the cache. Such compiler based coherence techniques that make use of software control of caches are typically referred to as “software caches.” The “software cache” essentially provides a structure that enables a machine with long latency access to a shared memory, e.g., a global memory, to cache frequently accessed data in a local, software controlled store/scratch memory. More information regarding cache coherence and software cache coherence mechanisms may be found in Darnell et al., “Automatic Software Cache Coherence through Vectorization,” Proceeding of the 1992 International Conference on Supercomputing.
According to typical directory-based cache coherence protocols, when a lookup operation for a portion of data results in the portion of data being located in the software cache, the result is a software cache “hit.” When a lookup operation for a portion of data results in the portion of data not being located in the software cache, the result is a software cache “miss.” A typical software cache “hit” access consists essentially of first locating the cache directory data associated with a global address of the data that is requested. The software cache directory is a data structure, used in directory based cache coherence protocols, that tracks where each “page,” or block of memory, has been cached. The global address may be used with a software cache directory to identify where the page of memory containing the requested data is stored.
After locating the storage location of the page of memory containing the requested data using the cache directory and the global address of the data that is requested, the software cache directory data is accessed and checked for a software cache “hit.” If there is a software cache “hit,” meaning that the data corresponding to the global address is present in the software cache, the data in the cache line associated with the software cache “hit” is accessed to thereby deliver the data to the requesting application.
When accessing the software cache directory data results in a software cache “miss,” i.e. the requested data for the global address is not present in the software cache, a miss handler is invoked. The miss handler is a software routine that operates to retrieve the requested data from another cache, the shared or global memory, or even physical storage such as a hard disk, in the event that the data is not present in the software cache.
The cache miss handler typically, after being invoked, will find a cache line of the software cache to evict and will write any dirty data, i.e. data that has been modified, in the evicted cache line to memory or physical storage, such as by way of a direct memory access (DMA) operation or by way of a series of explicit memory copy instructions. The new cache line may then be obtained and written into the software cache. The DMA operation and the writing in, such as by way of a DMA operation, of the new cache line may be performed in a substantially parallel manner if an additional temporary cache line is used with appropriate bookkeeping being performed in the software cache directory. Appropriate synchronization mechanisms need to be used between writing the new cache line to the software cache and allowing the data in this new cache line to be accessed and delivered to the requesting application so that the data being cached is not used prior to the data being stored in the software cache.
It can be seen that when a software cache “miss” occurs, a large latency may be experienced while the miss handler performs the necessary operations for evicting a cache line, writing dirty data to another storage, locating a new cache line to load into the software cache, and actually loading that cache line into the software cache. With hardware caches, such latency can be somewhat hidden by scheduling a hardware data pre-fetch instruction early enough ahead of the actual use of the data. That is, the hardware may pre-fetch data into the hardware cache in anticipation of the data being used at a later time and thereby decrease the number of cache “misses” encountered. Various hardware based pre-fetching mechanisms are generally known in the art.
If a straightforward analogy of the hardware pre-fetching mechanisms is made with software caching, the result is that there are two software cache directory accesses performed, one for the pre-fetching and one for the actual data access. This overhead may be small in hardware, however, it is very significant for software caches where the software cache directory access is typically in the order of tens of instructions with a total latency in the order of tens of processor cycles.