1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a compiler implemented software cache apparatus and method in which non-aliased explicitly fetched data are excluded from the software cache.
2. Description of Related Art
Shared memory multiprocessor systems are typically composed of a plurality of processors and memories that are linked by an interconnection bus or network. In such shared memory multiprocessor systems, because memory accesses must go through this interconnection bus or network, memory access latency becomes important to the performance of the multiprocessor system. Various approaches have been attempted to minimize this access latency. Such approaches generally involve multithreading techniques and caching techniques.
With particular importance to the present invention, when using caching in a multiprocessor system, the need to maintain cache coherence is an important consideration. That is, in order to avoid changing the semantics of a program execution through the use of caches, the memory must retain the appearance of sequential consistency. Most approaches to this cache coherence problem have focused on hardware mechanisms to maintain coherence. However, the overhead of maintaining coherence in hardware can be high and scaling systems based on hardware coherence can be a difficult problem.
An alternative to hardware-based solutions for coherence is to use compilers to analyze programs and automatically augment them with calls to coherence operations, e.g., updates and invalidates, where necessary. Compiler based coherence techniques require only minimal support from cache hardware. The hardware need only provide a mechanism to enable software control of the cache. Such compiler based coherence techniques that make use of software control of caches are typically referred to as “software caches.” More information regarding cache coherence and software cache coherence mechanisms may be found in Darnell et al., “Automatic Software Cache Coherence Through Vectorization,” Proceeding of the 1992 International Conference on Supercomputing.
In a program compiled to use a compiler implemented software cache, in addition to using the software cache, there are also opportunities to use explicit fetching of data to avoid the cache lookup overhead costs. That is, the compiler may explicitly fetch data that is used often in a program and place this data in an explicitly fetched data buffer so that this data is made available locally in such a manner that it may be directly accessed without having to perform a software cache lookup operation and without having to re-fetch the data from a system memory and thus, experiencing the associated access latency.
Explicit fetching of data is beneficial for a number of reasons. First, with explicitly fetched data, the compiler can be certain that the entire bundle of data that is explicitly fetched will be utilized by the program. On the other hand, for software cache data, since operations on the software cache must be performed on a cache line by cache line basis, some data in the cache line may not actually be utilized by the program. Thus, larger bundle of data can be accessed together by explicit fetching to reduce the setup overhead for data transfer. Moreover, with software cache data, a cache lookup operation must be performed in order to locate the required data in the software cache and then retrieve the data from the software cache or perform miss handling if the data is not present within the software cache. With explicitly fetched data, no such cache lookup operation is required since the data is known to be present in the explicitly fetched data buffer and specific references to the explicitly fetched data buffer are utilized.
To maintain correctness of the two copies of the original data, i.e. the software cache data and the explicitly fetched data, extra operations must be invoked. For explicit fetching of data, such explicit fetching must be recorded in the associated cache directory, i.e. the addresses of the explicitly fetched data are recorded in a cache directory, so that corresponding cached data may be maintained consistent with the explicitly fetched data. The recording of this address information in the cache directory pertaining to the explicitly fetched data requires additional processor cycles, thereby increasing the execution time of the code.
In addition, when a software cache becomes full and additional data is to be loaded into the software cache, existing data in the software cache must be evicted in order to make space available for the new data. When choosing candidates for eviction from the software cache, the software cache may not evict explicitly fetched data since subsequent instructions may refer to this data without the use of a cache lookup operation. Such a situation may lead to an inconsistency between the explicitly fetched data and the cached data and may lead to errors in the execution of program instructions. Thus, again, when loading data into the software cache, additional processor cycles are required to determine what data may and may not be evicted from the software cache to provide additional space for the new data.
Thus these two constraints on the use of a software cache increase the execution time of program code that uses a combination of software caching and explicitly fetching.