1. Field of the Invention
The present application relates generally to an improved data processing apparatus and method and more specifically to an apparatus and method for performing efficient software cache accessing with handle reuse.
2. Background of the Invention
A cache is a mechanism for expediting access to data/instructions that a processor would otherwise need to fetch from main memory. The cache serves as a temporary storage area where frequently accessed data/instructions can be stored for rapid access. Once the data/instructions are stored in the cache, future use may be made by accessing the cached copy of the data/instructions rather than re-fetching or re-computing the original data so that the average access time is shorter.
Caches may be managed by hardware or software. Software caches are used, for example, in computer architectures with non-coherent local memories in which frequently accessed global data structures can be temporarily stored. A typical sequence of actions for accessing a software cache, assuming a reference to a variable a[i] mapped in a global address space, comprises attempting to obtain the data from the software cache, determining if a software cache miss occurs, and if a software cache miss occurs, performing miss handling to load a software cache line having the data from main memory into the software cache. This sequence of actions, provided in pseudocode, looks like:
getDataFromCache(addr);  if (miss(addr)) missHandler(addr);  ptr = lineInCache(addr);  data = load (ptr + offset(addr)).
With this pseudocode, when attempting to get the data pointed to by the global address “addr,” the first task is to check if the data pointed to by addr is in the local cache or not. This action typically mimics the actions of a typical hardware cache by first determining the set, e.g., the portion of the software cache, in which the data is expected then comparing the address to the tags associated with that set. If there is a match among the tags, then a cache hit is determined to have occurred. Otherwise, if a match is not found among the tags, a cache miss is determined. The task of the miss handler is to evict one line from the set, possibly writing it back to the global address space, and then loading the cache line associated with the requested address. The local cache line that is associated with the requested data corresponding to addr is determined and identified by a pointer value ptr. This pointer value is used to retrieve the data by accessing the data value indexed by the offset of addr within the cache line pointed to by the pointer ptr. On one known processor architecture, the above actions may take approximately 11 instructions and 27 processor cycles to complete assuming that there is a cache hit, i.e. the miss handler is not invoked.
Now consider the following code sequence involving integer computations:=a[i]+a[i+4]+a[i+8]+a[i+16]
Using known mechanisms, this code sequence is transformed into the following code sequence by the compiler in which cache lookup code is inserted:
if (miss(&a[i+0]) missHandler(&a[i+0]); c0=lineInCache(&a[i+0]);if (miss(&a[i+4]) missHandler(&a[i+4]); c4=lineInCache(&a[i+4]);if (miss(&a[i+8]) missHandler(&a[i+8]); c8=lineInCache(&a[i+8]);if (miss(&a[i+16]) missHandler(&a[i+16]);c16=lineInCache(&a[i+16]);= load(c0+offset(&a[i+0])) +   load(c4+offset(&a[i+4])) +   load(c8+offset(&a[i+8])) +   load(c16+offset(&a[i+16]))
As shown above, this transformed code uses 4 cache lookup operations with each cache lookup operation taking approximately 27 processor cycles assuming no cache misses. Thus, assuming no cache misses, this code may take 27*4=108 processor cycles to complete. If there is a cache miss, the number of processor cycles may be dramatically increased as the miss handler must evict a cache line and load the required cache line using a main memory access. For example, in one processor architecture, a cache miss may require approximately 35 processor cycles to detect the cache miss and approximately 400 processor cycles to get the required data from main memory. Thus, both cache hits and cache misses represent a large cost in processor cycles.