1. Field of the Invention
This invention relates to computer systems, and more particularly to monitoring the performance of a microprocessor.
2. Description of the Relevant Art
Most computer systems include a microprocessor which functions as a central processing unit (CPU). Modern microprocessors, including the Intel Pentium.TM. processor, have hardware dedicated for measuring and monitoring various parameters which contribute to the performance of the microprocessor. In the case of the Pentium.TM. processor, the dedicated hardware includes several model specific registers (MSRs): a 64-bit time stamp counter (TSC) incremented every clock cycle, a control & event select register (CESR), and two 40-bit performance monitor counters (CTRs). The TSC, CESR, and the two CTRs are addressable registers, and their contents may be read or changed by software instructions. Each CTR may be individually programmed, via values stored within the CESR, to count the total number (or duration in clock cycles) of specific "events" occurring within the microprocessor during operation. Such events include memory accesses (e.g., data/code reads and data writes), data/code cache misses, pipeline flushes, and locked bus cycles. The information provided by the dedicated hardware may be used to improve the overall performance of the computer system by "tuning" the memory system or software programs generated by compilers.
Several problems limit the usefulness of the existing performance monitoring hardware. First, there are only two CTRs, thus a maximum of two events may be monitored at any given time. The CTRs are programmed by values stored within the CESR, and there are a fixed number of events to choose from. For example, there are 38 documented events from which to choose for the Pentium.TM. processor. In order to obtain counts for all events which may be monitored, it is necessary to repeat a test program 19 times while gathering counts for two events during each execution of the test program.
Second, and most importantly, there is no way to correlate the occurrence of an event with the time at which the event occurred. In cases where several factors affect a given aspect of system performance, the total number of events may indicate the presence or absence of a problem, but may not be particularly useful in determining the best solution to a problem. In some cases, a graph of the frequency distribution of an event is much more useful than the total number of events which occurred during execution of a test program.
A histogram is a bar graph of a frequency distribution in which the heights of the bars represent the total number of events occurring within in a corresponding time interval. Forming a histogram involves dividing a time period of interest into time intervals of equal length, and counting the total number of events occurring within each time interval. As a practical matter, summing numbers of events occurring within time intervals reduces the data storage requirements of a data acquisition system performing the counting operation while still providing useful event frequency information.
A good example illustrating the utility of a graph of the frequency distribution of an event is cache misses occurring during execution of a test program. FIGS. 1 and 2 will now be used to illustrate how such a graph may suggest which of several factors is the most likely cause of a problem. As described above, a desired data acquisition time is divided into time intervals (i.e., histogram time periods) of length t, and the total number of cache misses occurring within each histogram time period t are counted and graphed. FIG. 1 is a histogram showing the frequency of cache misses occurring within a first memory system during execution of the test program. In the first memory system, the frequency of cache misses follows a trend. The frequency of cache misses is initially high as the empty cache is filled, decreases relatively quickly at an initial rate 10, then continues to decrease as more needed instructions are located within the cache. Eventually a lowest number of cache misses "M1" is achieved by the first memory system. Sudden increases or "spikes" (e.g., spike 12) in the frequency of cache misses occur as when new sections of program code are loaded into memory and executed.
FIG. 2 is a histogram showing the frequency of cache misses occurring within a second memory system during execution of the same test program. As in the first memory system, the frequency of cache misses within the second memory system is initially high as the empty cache is filled, and decreases with time as more needed instructions are found within the cache. The initial rate of the decrease 14 is not as great as that of the first memory system, however, and the lowest number of cache misses M2 achieved by the second memory system is substantially greater than M1. Spike 16 corresponds to spike 12, and occurs as the same section of program code is loaded into memory and executed. Spike 16 occurs later in time than spike 12 as the second memory system is less efficient than the first.
Key factors which affect the frequency of cache misses within a memory system include cache size and the technique used to select information stored within the cache for replacement by "newer" data (i.e., the cache replacement algorithm). FIG. 1 indicates the cache replacement algorithm of the first memory system is adequate. The best way to reduce the frequency of cache misses and thereby improve the performance of the first memory system is to increase the size of the cache. On the other hand, FIG. 2 indicates the cache replacement algorithm of the second memory system is probably not working well. Increasing the size of the cache would not be the best way to improve the performance of the second memory system; improving the cache replacement algorithm would probably be more effective.
It would be beneficial to have a microprocessor which includes performance monitoring hardware allowing more than two events to be monitored at any given time and correlating the occurrence of an event with the time at which the event occurred. Such a microprocessor would reduce the number of times a test program must be executed in order to gather performance monitoring information. Such a microprocessor would also allow graphs of numbers of events versus time to be created, greatly enhancing the ability to increase the overall performance of the computer system by "tuning" the memory system or instruction sequences generated by compilers.