The present invention relates in general to data processing systems, and in particular, to performance monitoring of events in data processing systems.
In typical computer systems utilizing processors, system developers desire optimization of execution software for more effective system design. Usually, studies of a program""s access patterns to memory and interaction with a system""s memory hierarchy are performed to determine system efficiency. Understanding the memory hierarchy behavior aids in developing algorithms that schedule and/or partition tasks, as well as distribute and structure data for optimizing the system.
Performance monitoring is often used in optimizing the use of software in a system. A performance monitor is generally regarded as a facility incorporated into a processor to monitor selected characteristics to assist in the debugging and analyzing of systems by determining a machine""s state at a particular point in time. Often, the performance monitor produces information relating to the utilization of a processor""s instruction execution and storage control. For example, the performance monitor can be utilized to provide information regarding the amount of time that has passed between events in a processing system. The information produced usually guides system architects toward ways of enhancing performance of a given system or of developing improvements in the design of a new system.
The present invention provides a representation of the use of software-directed asynchronous prefetch instructions that occur during execution of a program within a processing system. Ideally, the instructions are used in perfect synchronization with the actual memory fetches that they are trying to speed up. In practical situations, it is difficult to predict ahead of time all side effects of these instructions and memory access latencies/throughput during the execution of any large program. Incorrect usage of such software-directed asynchronous prefetch instructions can cause degraded performance of the system.
Understanding the efficient use of these instructions is not enough in itself to solve all memory access performance problems. It is necessary to identify the most prevalent causes for limitations in the memory subsystem bandwidth. Then, the most appropriate solutions to increase memory bandwidth can be determined.
The present invention concerns the measuring of the effectiveness of such software-directed asynchronous prefetch instructions (xe2x80x9csdapisxe2x80x9d). The sdapis are used in a context such as video streaming. Prefetching data in this context is unlike that of prefetching instructions based on an instruction sequence or branch instruction history. It is assumed in the video streaming context that data location is virtually unknowable without software direction. One consequence, then, is that it is a reasonable assumption that virtually every software-directed prefetch results in a cache hit, which would not be a hit in the absence of the software-directed prefetch.
Assume that a program, or a simulation of a program, is running with sdapis (program execution without sdapis is expected to be slower). The number of clock cycles for running the program is counted. In a first aspect, the invention deduces that performance is improved, compared to not running sdapis, according to the reduction in memory access misses, i.e., increase in cache hits, wherein it is assumed that each instance of sdapis causes a cache hit that otherwise would have been a cache miss. In terms of cycles, this is expressed as average cache miss penalties cycles times the number of cache misses avoided (i.e., increase in cache hits). Another aspect, concerns measuring well-timed sdapis and poorly-timed sdapis. The extent of well-timed and poorly-timed sdapis is deduced by counting certain events, as described herein, that concern instances where sdapis result in loading data and the data is not used at all, or not used soon enough to avoid being cast out, and measuring certain time intervals in the case of instances where sdapis result in loading data and the data is used. Another aspect concerns measuring an extent to which sdapis impede certain memory management functions. This extent is deduced by counting certain disclosed events involving tablewalks and translation lookaside buffer castouts. Another aspect concerns measuring an extent to which sdapis are contemplated, but stopped. Events concerning cancellations and suspensions are disclosed. In another aspect, the above measurements are included in numerous streams.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.