1. Field of the Invention
The present invention relates generally to computer processor performance, and more particularly to automated structures and methods to support profile-guided and dynamic optimizations that in turn improve computer processor performance.
2. Description of Related Art
The performance of today's processors is often limited by the impact of cache misses. Many hardware prefetching schemes have been proposed to reduce cache miss penalties. Some of them are simple and easy to implement, such as next line prefetching, tagged prefetching and stride-based prefetching (See for example, S. P. Vanderwiel and D. J. Lilja, “Data Prefetch Mechanisms,” ACM Computing Survey, June, 2000, which is incorporated herein by reference as a demonstration of the level of skill in the art), but they are ineffective for more complex but commonly used memory reference patterns such as indirect references and references to dynamic structures. Markov prefetching (See for example, D. Joseph and D. Grunwald, “Prefetching using Markov Predictors”, 24th International Symposium on Computer Architecture, 1997, which also is incorporated herein by reference as a demonstration of the level of skill in the art) can learn about miss correlations and yield more effective prefetching based on observed correlations among misses. However, such a scheme requires a large amount of on-chip storage to track and remember frequent memory references. This approach may not be cost-effective since the large amount of storage can instead be used to increase the size of cache memory.
Most existing processors support software prefetching. See for example Vanderwiel and Lilja that was referenced above. For example, compilers or programmers may insert prefetch instructions into the generated code to conduct cache prefetching. Software cache prefetching has often been used in scientific and engineering applications where program behavior and data reference patterns are more predictable.
For commercial applications, the effectiveness of software prefetching has been very limited. Although compiler analysis techniques have been proposed to identify possible performance critical cache misses, such analyses are only suitable when the data set size is known at compile time or when the size changes infrequently from one run to the other. See for example, T. Mowry, M. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching” 5th International Conference on Architectural Support for Programming Languages and Operating Systems, 1992, which also is incorporated herein by reference as a demonstration of the level of skill in the art.
Currently, software prefetching performed by the compiler is guided mostly by compiler options/flags. It is difficult for a compiler to determine which region/loops are likely to miss the caches. Indiscriminately inserting cache prefetches can significantly degrade performance if the execution of the code actually incurs few cache misses. One way to improve automatic software cache prefetching is to provide accurate cache miss profiles to the compiler or to a runtime optimizer. Current state-of-art Hardware Performance Monitors (HPM) provide cache miss frequency, instruction locations for cache miss events, and addresses of cache miss references. Such information is based on individual events, and can be used to determine the region/loops and the specific instructions that are responsible for frequent cache misses. However, to improve the effectiveness of software cache prefetching, the optimizer needs information about a cluster of misses.
Some new microprocessors, such as the Dual-Core Itanium 2 Processor, available from Intel Corporation, support cache Event Address Registers (EAR) to track the instruction and data addresses of the offending instruction and other useful information about the latest cache miss event. Such event address registers can also be used to track translation look-aside buffer (TLB) miss events or data speculation check events. The event address register records only a single event.