1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and system for monitoring performance within a data processing system. Specifically, the present invention relates to a method and system for monitoring performance of storage access and control.
2. Description of Related Art
In typical computer systems, system developers desire optimization of software execution for more effective system design. Usually, studies are performed to determine system efficiency in a program""s access patterns to memory and interaction with a system""s memory hierarchy. Understanding the memory hierarchy behavior helps optimize the system through the development of algorithms that schedule and/or partition tasks as well as distribute and structure data.
Within state-of-the-art processors, facilities are often provided which enable the processor to count occurrences of software-selectable events and to time the execution of processes within an associated data processing system. These facilities are known as the performance monitor of the processor. Performance monitoring is often used to optimize the use of software in a system. A performance monitor is generally regarded as a facility incorporated into a processor to monitor selected characteristics to assist in the debugging and analyzing of systems by determining a machine""s state at a particular point in time. Often, the performance monitor produces information relating to the utilization of a processor""s instruction execution and storage control. For example, the performance monitor can be utilized to provide information regarding the amount of time that has passed between events in a processing system. As another example, software engineers may utilize timing data from the performance monitor to optimize programs by relocating branch instructions and memory accesses. In addition, the performance monitor may be utilized to gather data about the access times to the data processing system""s L1 cache, L2 cache, and main memory. Utilizing this data, system designers may identify performance bottlenecks specific to particular software or hardware environments. The information produced usually guides system designers toward ways of enhancing performance of a given system or of developing improvements in the design of a new system.
Events within the data processing system are counted by one or more counters within the performance monitor. The operation of such counters is managed by control registers, which are comprised of a plurality of bit fields. In general, both control registers and the counters are readable and writable by software. Thus, by writing values to the control register, a user may select the events within the data processing system to be monitored and specify the conditions under which the counters are enabled.
To evaluate the behavior of memory accesses by a processor, it is necessary to determine the locations of those memory accesses and the number of accesses that are consumed on behalf of executing instructions. In computer systems with hierarchical memory systems, the time required to access a given memory item depends on where in the memory hierarchy the memory item resides. Items that reside in the highest levels of the hierarchy tend to require less time to access than those in lower levels of the hierarchy. Since system performance tends to be decreased by increases in the average time to access memory items, it follows that the most frequently accessed memory items should be in the highest (fastest) points in the hierarchy.
Since the highest levels of the hierarchy usually have much less capacity that the lowest levels of the hierarchy, it usually is the case that not all of the most frequently accessed memory items will fit into the highest levels of the memory hierarchy. This occurs because of the alignment of the memory items in the memory address space, because of the stride of data items, or because the number of frequently accessed memory items is too large to fit into the highest levels of the hierarchy or for other reasons.
If the most frequently accessed memory items can be identified, engineers can focus on redesigning those aspects of the software responsible for the memory references in order to utilize the hierarchy more efficiently and thereby increase system performance. This is especially true in parallel or multiprocessor systems.
Previous solutions to identifying the most frequently accessed memory items have utilized simulation, software tracing (i.e., single step tracing), and hardware probes that access internal system busses. While all of these methods have some advantages, they all suffer from drawbacks. Tracing and simulation are very difficult and can induce distortion in the system execution. Hardware probe schemes can be very effective but must be designed into the system in a way that can detrimentally impact system physical packaging and cost.
Therefore, it would be advantageous to have a method and system for accurately monitoring the use of memory resources within a processor. It would be further advantageous to have a method and system for detecting a set of frequently accessed memory items using support structures within a processor.
The present invention provides a method and system for monitoring the performance of a processor to detect a set of frequently accessed memory items. A memory region to be monitored is selected and divided into an upper half monitored memory region and a lower half monitored memory region. Memory accesses to the upper half monitored memory region and memory accesses to the lower half monitored memory region are counted during a measurable interval. In response to the count of memory accesses to the upper half monitored memory region being greater than the count of memory accesses to the lower half monitored memory region, the monitored memory region is updated to be equal to the upper half monitored memory region. In response to the count of memory accesses to the lower half monitored memory region being greater than the count of memory accesses to the upper half monitored memory region, the monitored memory region is updated to be equal to the lower half monitored memory region. The steps of updating, dividing, and counting memory accesses to the monitored memory region during a measurable interval are repeated for a number of iterations in order to identify a frequently accessed memory region. As a set of instruction executes in the processor, a performance monitor may count the memory accesses and provide the numbers for optimization analysis.