Performance monitoring is an integral aspect of computational-system development. Modern computational systems are extremely complex electro-optico-mechanical systems with thousands of individual components, many including integrated circuits that may each include millions of submicroscale active and passive electronic subcomponents. To manage this complexity, modern computational systems feature many layers of hierarchical control and organization, from low-level hardware controllers and control circuits all the way up to complex control components and subsystems, including firmware controllers and computer-instruction-implemented subsystems, including virtualization layers, operating systems, and application programs, often comprising millions, tens of millions, or more computer instructions compiled from complex computer programs. In general, there are an essentially limitless different number of ways in which these control subsystems can be implemented and deployed to provide any number of different sets of features and operational behaviors. In many cases, even small changes in the sequence of computer-instruction execution can lead to large changes in the computational efficiency, accuracy and robustness, and latencies associated with the complex computational systems.
While careful design and implementation of the many different layers of control systems and organizations of components within complex computational systems can lead to reasonable levels of performance, it is often not possible, because of the complexity of the hierarchical levels of control and organization, and the unpredictable nature of workloads, to anticipate the various problems and pitfalls that arise when the hierarchical levels of control and organization are deployed in a physical system. As a result, many thousands, hundreds of thousands, or more man hours of tuning, partial redesign, and optimization are often needed to achieve desired performance levels. These activities are all based on various types of performance-monitoring efforts that are used to monitor and evaluate operation of the complex computational systems. Performance monitoring is also generally hierarchically structured, from high-level benchmark tests that measure the efficiency and throughput of the computational systems as they execute high-level tests to highly specific, targeted testing of small subassemblies of components and individual routines within complex control programs.
In the past decades, computer processors have been enhanced with performance-monitoring units (“PMUs”) that allow various types of events and operational activities that occur during processor operation to be counted over defined time intervals. The performance-monitoring units generally comprise register-and-instruction interfaces to underlying event-monitoring hardware features. The type of low-level performance monitoring provided by PMUs can often reveal inefficiencies and deficiencies in the design and operation of higher-level control systems, including virtualization layers and operating systems. Unfortunately, the performance-monitoring units (“PMUs”) are generally processor-type and even processor-model specific, with great variations in the interface to, and capabilities of, the many different types of PMUs. As a result, use of PMUs may require complex, system-specific development of higher-level performance-monitoring tools based on the specific PMUs within a system. Even more problematic is the fact that, in modern complex computational systems, computational tasks, including processes and threads, may often move among many different types of individual systems with different types of processors during execution of high-level tasks, such as benchmark computing tasks. As a result, it is a significant challenge to attempt to use the capabilities of various different PMUs within complex computational systems that feature virtualization layers. For these and many other reasons, designers and developers of complex computational systems and program-implemented control systems that execute within them continue to seek new methods for accessing the capabilities of processor-embedded PMUs.