1. Field of the Invention
The present invention relates to the field of memory caching and more particularly to the field of load and store instructions management during program code compilation, optimization and execution.
2. Description of the Related Art
Memory cache technologies have formed an integral part of computer engineering and computer science for well over two decades. Initially embodied as part of the underlying hardware architecture of a data processing system, data caches and program instruction caches store often-accessed data and program instructions in fast memory for subsequent retrieval in lieu of retrieving the same data and instructions from slower memory stores. Consequently, substantial performance advantages have been obtained through the routine incorporation of cache technologies in computer main board designs.
Data and instruction caches are hardware structures, that traditionally transparent to software (i.e., the hardware manages them without software intervention or knowledge). Data caching technologies have become particularly important in the context of program code compilation and optimization. Compiler technology can be used to generate instructions that provide cache management hints, such as pre-fetching, or generate code that has memory access patterns which favor cache reuse. In program code compilation and optimization, program code can be tooled to encourage processor caching of required data so as to avoid persistent retrieval of data from main memory.
In this regard, the effective use of a processor cache can be crucial to the performance of an application. To with, it has been shown that cache misses are not evenly distributed throughout a program. In fact, academically, it has been shown that a small number of load instructions are responsible for most cache misses in an application, called delinquent loads in the rest of the description. A “delinquent load” is a load instruction whose execution consistently results in a cache miss. Identification of delinquent loads therefore, can be essential to the success of many cache optimization and pre-fetching techniques.
There are a number of compilation techniques used to statically analyze program sections and discover load instructions which have a high probability of resulting in cache misses. Once identified, there are steps that the compiler can take to pre-fetch the memory locations into the cache and thus ameliorate the dilatory effects of these instructions.
Pre-fetching is a technique used to hide the latency of a cache miss by making a memory reference far in advance of when that data is required. Pre-fetching consists of providing a hint to the processor that a datum at a specific address will be needed in the very near future. Inserting a pre-fetch instruction requires two attributes: the data address of the datum needed and how far in advance to insert, such that the data is in the cache when it is needed. Pre-fetching is most often done in loops because it is easier to predict that a data element will be required in the future. How far in advance a microprocessor must fetch or “pre-fetch” is determined by the stride distance (S), the latency (L) between main memory and the cache, the loop iteration time (T), and the cache line size (N). According to S, L, T and N, a pre-fetch distance (P) can be computed as P=S(L/T)/N where L and T are measured in cycles, N is expressed in terms of the number of data elements in the cache line, and P is expressed in units of cache line size. Thus, as the latency increases, the compiler will have to fetch farther in advance to allow sufficient time for the element to be brought from main memory to the cache.
Despite the promise of pre-fetching, there can be delinquent load instructions for which any static analysis can only conclude that a load instruction has a consistent probability of failing to find its datum in the cache. In this circumstance, it is inadvisable to pre-fetch the memory locations into the cache since doing so can result in performance degradation due to either cache pollution, or excess memory bandwidth consumption without the pre-fetching being successful. Furthermore, there are cases where a static analysis can determine that a load instruction is delinquent but cannot determine how far in advance data should be pre-fetched in order to satisfy the load in time.