Performance monitoring is an integral aspect of computational-system development. Modern computational systems are extremely complex electro-optico-mechanical systems with thousands of individual components, many including integrated circuits that may each include millions of submicroscale active and passive electronic subcomponents. To manage this complexity, modern computational systems feature many layers of hierarchical control and organization, from low-level hardware controllers and control circuits all the way up to complex control components and subsystems, including firmware controllers and computer-instruction-implemented subsystems, including virtualization layers, operating systems, and application programs, often comprising millions, tens of millions, or more computer instructions compiled from complex computer programs. In general, there are an essentially limitless different number of ways in which these control subsystems can be implemented and deployed to provide any number of different sets of features and operational behaviors. In many cases, even small changes in the sequence of computer-instruction execution can lead to large changes in the computational efficiency, accuracy and robustness, and latencies associated with the complex computational systems.
While careful design and implementation of the many different layers of control systems and organizations of components within complex computational systems can lead to reasonable levels of performance, it is often not possible, because of the complexity of the hierarchical levels of control and organization, and the unpredictable nature of workloads, to anticipate the various problems and pitfalls that arise when the hierarchical levels of control and organization are deployed in a physical system. As a result, many thousands, hundreds of thousands, or more man hours of tuning, partial redesign, and optimization are often needed to achieve desired performance levels. These activities are all based on various types of performance-monitoring efforts that are used to monitor and evaluate operation of the complex computational systems. Performance monitoring is also generally hierarchically structured, from high-level benchmark tests that measure the efficiency and throughput of the computational systems as they execute high-level tests to targeted testing of smaller subassemblies of components and individual routines within complex control programs. In the past decades, computer processors have been enhanced with performance-monitoring units (“PMUs”) that allow various types of events and operational activities that occur during processor operation to be counted over defined time intervals. The performance-monitoring units generally comprise register-and-instruction interfaces to underlying event-monitoring hardware features. The type of low-level performance monitoring provided by PMUs can often reveal inefficiencies and deficiencies in the design and operation of higher-level control systems, including virtualization layers and operating systems. Unfortunately, the PMUs are generally implemented to count the occurrences of specific processor events, such as memory accesses, cache misses, and other such low-level processor events. In many cases, there is no straightforward connection between the counts of particular events and higher-level system problems and anomalies, including bottlenecks. By contrast, higher level benchmark testing and software programs devised to test computer systems may often fail to capture data at sufficient granularity and frequency to reveal underlying causes of system problems and anomalies, including bottlenecks. As a result, system analysts, designers, and administrators may currently be unable to acquire and access, through system testing and monitoring, much potentially valuable information that would allow for the identification and diagnosis of many types of system problems and anomalies.