In typical computer systems, optimization of software execution results in more effective system design by providing higher system performance. Memory system usage is an area that can provide significant opportunities for improvement. Careful investigation of memory hierarchy traffic has proven to reveal details of system behavior that can aid in the development of higher performing software algorithms. Performance monitors are useful in such investigations but existing performance monitors have shortcomings for such purposes.
A performance monitor is a facility incorporated into a computer system used to monitor selected internal characteristics affecting the rate at which the computer system does useful work. When properly configured, a performance monitor provides the internal state of a computer system at particular points in time. This means of measurement provides insights that are otherwise difficult to come by.
Performance monitors are important in that they expose the underlying details of the hardware normally hidden by abstract interfaces. In striving for software portability, it is desirable that abstracted interfaces be presented to software to free the software from dealing with system details. This act of abstraction, so valuable in promoting portability, obscures certain system states critical in providing optimal performance.
Typically, a performance monitor produces implementation dependent information relating to the processor's instruction execution control and storage control. Enhancements to the performance monitor provide critical information regarding the cost in terms of time required to retrieve data from a computer system memory. Such information guides system architects toward ways of enhancing performance of a given system or for developing improvements in the design of a new system.
Because of the hierarchical nature of memory systems, the time required to retrieve data from memory is not constant; measuring these timings requires special consideration of the details of computer memory structures. In most software, some data is frequently used while other data is seldom used. Most computer memory systems take advantage of this fact by incorporating small staging areas. These staging areas store frequently used data. Such areas are usually smaller and more rapidly accessed than system memory. This allows systems to complete work faster and thus have higher performance. Similarly, some items held in the staging areas are more frequently used than others. This leads to the use of an additional secondary staging area. If the required datum is not in the first staging area, the second staging area is next checked. If the item is not in the second staging area, system memory is checked. Because of the high probability of finding a required datum in some staging area, the average time to retrieve the datum is typically lower in hierarchically configured memory systems. Consequently, current memory systems are structured as hierarchies of staging areas where the staging areas become larger and slower in order of access.
It is clearly desirable to keep the most frequently reused data in the staging areas (hereinafter referred to as "caches") closest to point of usage. The degree to which this is achieved is crucial to system performance, since it affects the average time to access the data. This aspect of the memory hierarchy can be quantified by considering the likelihood that a given datum will not be found in a particular point in the cache. An access in which the required datum is "missing" from the cache is called a "cache miss." The ratio of cache misses to cache access is called the "cache miss ratio." Accompanying the cache miss ratio is the amount of time required to obtain the missing datum. The hierarchical cache levels are accessed sequentially. The first caches are usually the fastest but also the least capacious. Each subsequent cache is larger in capacity but slower to access. A metric that allows the measurement of the cost of accessing a given datum is defined by considering the order in which the caches are accessed. The caches accessed first are considered the closest in that they require the least amount of time to access. Caches subsequently accessed are correspondingly considered more distant than those accessed first. So the time required to access a needed datum depends on how distant the datum is in terms of the number of caches that must be accessed to obtain the datum.
In computers with hierarchical memory systems, the actual location of a needed datum is time dependent and essentially unpredictable; consequently the time to obtain a given datum is variable. The standard method used to evaluate the cost of a cache miss considers both the number of cache misses and the total duration of time (in cycles) for each cache miss. Because of the highly concurrent execution hardware employed in current computer systems, this approach does not provide a correct view of the effect because concurrent computer systems allow a substantial degree of out-of-order execution. Often the time needed to access a datum has less than expected impact on forward progress. This is because other useful work can be done while a given datum is accessed so forward progress may be occurring in a variety of units. Therefore, counting the number of cycles in which cache misses are in process does not accurately reflect the cost of the cache miss, since other units may be making forward progress. Therefore, simply counting events does not give a full picture of the state of the machine. This situation is not unique to the retrieval of memory data; there are similar situations when time-consuming calculations impede forward progress. For these and similar reasons that will be apparent to individuals skilled in the art, it is desirable to provide an improved method and system for performance monitoring that supplies additional information regarding the causes and effects of event occurrences within a data processing system.