In typical computer systems utilizing processors, system developers desire optimization of execution software for more effective system design. Usually, studies of a program's access patterns to memory and interaction with a system's memory hierarchy are performed to determine system efficiency. Understanding the memory hierarchy behavior aids in developing algorithms that schedule and/or partition tasks, as well as distribute and structure data for optimizing the system.
Performance monitoring is often used in optimizing the use of software in a system. A performance monitor is generally regarded as a facility incorporated into a processor to monitor selected characteristics to assist in the debugging and analyzing of systems by determining a machine's state at a particular point in time. Often, the performance monitor produces information relating to the utilization of a processor's instruction execution and storage control. For example, the performance monitor can be utilized to provide information regarding the amount of time that has passed between events in a processing system. The information produced usually guides system architects toward ways of enhancing performance of a given system or of developing improvements in the design of a new system.
Prior art approaches to performance monitoring include the use of external test instruments. Unfortunately, this approach is not completely satisfactory. Test instruments can be attached to the external processor interface, but these cannot determine the nature of internal operations of a processor. Test instruments attached to the external processor interface cannot distinguish between instructions executing in the processor. Test instruments designed to probe the internal components of a processor are typically considered prohibitively expensive because of the difficulty associated with monitoring the many busses and probe points of complex processor systems that employ pipelines, instruction prefetching, data buffering, and more than one level of memory hierarchy within the processors. A common approach for providing performance data is to change or instrument the software. This approach however, significantly affects the path of execution and may invalidate any results collected. Consequently, software-accessible counters are incorporated into processors. Most software-accessible counters, however, are limited in the amount of granularity of information they provide.
Further, a conventional performance monitor is usually unable to capture machine state data until an interrupt is signaled, so that results may be biased toward certain machine conditions that are present when the processor allows interrupts to be serviced. Also, interrupt handlers may cancel some instruction execution in a processing system where, typically, several instructions are in progress at one time. Further, many interdependencies exist in a processing system, so that in order to obtain any meaningful data and profile, the state of the processing system must be obtained at the same time across all system elements. Accordingly, control of the sample rate is important because this control allows the processing system to capture the appropriate state. It is also important that the effect that the previous sample has on the sample being monitored is negligible to ensure the performance monitor does not affect the performance of the processor. Accordingly, there exists a need for a system and method for effectively monitoring processing system performance that will efficiently and noninvasively identify potential areas for improvement. A more effective performance monitoring system has been disclosed in the cross-referenced applications noted above.
Instrumentation of processors is now becoming popular. But providing information known to the processor does not provide for a full system analysis. In order to analyze the performance of the entire system (and not just the processor), it is important to provide information related to system components. The typical approach to providing information between system components and the processor is via signals, which translate into pins. The more pins, the more the processor cost. For this reason, it is usually prohibitive to require that the individual components provide count information to the processor, which can in turn, provide information to the application(s) running the processor.
Providing a cost effective means to control and capture the information related to the system components will allow for a better analysis of system performance for a wider variety of systems, including those built at a lower cost.