1. Field of the Invention
The present invention is related generally to a data processing system and in particular to a method and apparatus for a compiler. More particularly, the present invention is directed to a computer implemented method, apparatus, and computer usable program code for providing a uniform external and internal interface enabling a compiler to communicate information regarding delinquent memory operations with external user annotation, external tools and internally between passes for use in optimization.
2. Description of the Related Art
Memory latency dominates the performance of many applications on modern computer systems, despite continued advances in memory hierarchy techniques. Memory latency is the time that it takes a processor to retrieve or transfer requested data, such as a byte or word in memory, after the request is made.
A delinquent memory operation is a load or store operation that frequently has a long memory latency due to cache misses. The storage area of a computer system is typically organized as a hierarchy of levels, ranging from smaller and faster levels to larger and slower levels. A cache is a memory hierarchy level that can be accessed more rapidly than other storage areas, such as main memory or a hard disk. A memory hierarchy may contain several levels of cache with varying latencies and sizes. A cache miss occurs when a certain level of cache does not contain a data value needed by an executing instruction. Cache misses occur because the memory space in cache is generally limited as compared to other data storage types, such as hard disk space. If requested data is available in cache, memory latency is usually significantly shorter than if the data has to be retrieved from another memory hierarchy level. Thus, memory latency is a measure of the speed of memory retrieval. The lower the memory latency is, the more efficient memory retrieval operations are for an executing program.
Modern computing systems employ many techniques to increase the speed with which software executes. These techniques can be implemented in hardware as changes to the processor design, or in software as compiler optimizations. A compiler is a computer program that translates a series of statements written in a human readable language into a machine language, or otherwise modifies the code of a computer program. Compilers can reduce the latency of memory operations during program execution through certain optimizations, such as program data reorganization or insertion of software pre-fetching. These optimizations may be guided by static analysis at compile time or by dynamic analysis of cache misses using performance measurement tools. However, using currently available program performance measurement tools, it can be difficult to precisely identify delinquent memory operations.
This problem is particularly pronounced in modern processors, where instructions are grouped and tracked as a group throughout an execution pipeline. In such a case, several memory operation instructions may end up in the same instruction group and performance monitoring tools may not be able to precisely detect which instruction in the group is the instruction causing cache misses. Moreover, currently available compilers do not provide an interface for users and/or other software programs to communicate delinquent information to the compiler for use in reducing memory latency.