Many modern processors include a large number of architected performance-monitoring registers that are used to count the occurrences of performance-impacting events, including resource-exhaustion-related events, and to measure computational throughput. Resource-exhaustion events include many different types of performance-impacting conditions that arise during processor operation, such as cache-line eviction events, cache misses, delays in storing data or launching operations due to full queues, and delays in accessing data and launching operations due to empty queues. In many modern processors, performance-monitoring registers accumulate counts of the number of unstalled processor cycles and the number of pipeline-executed instructions successfully retired to allow performance-monitoring programs to compute general performance metrics, including the number of instructions executed per hardware cycle (“IPC”) and/or the number of processor cycles per successfully retired instruction (“CPI”), with IPC and CPI inversely related. Generally, when the IPC falls below a first threshold value and/or the CPI rises above a second threshold value, performance-monitoring components of virtual-machine monitors (“VMMs”), operating systems (“OSs”), and various higher-level performance-analysis systems may invoke a variety of different performance-monitoring analyses to attempt to determine likely causes of the observed decreases in computational throughput to take or suggest various types of ameliorative procedures, such as alternatively scheduling execution of virtual machines (“VMs”) or tasks, flagging particular tasks for redesign and reimplementation for performance optimization, reallocating computational and data-storage resources among VMs and tasks, and other types of ameliorative procedures. These analyses generally consider the many different types of accumulated counts and metrics provided by various additional performance-monitoring registers, such as performance-monitoring registers that accumulate counts of the number of cache misses, execution-pipeline stalls, full-queue and empty-queue events, and other types of potentially performance-degrading events. In addition, performance analysis may involve a variety of different types of data collected through software instrumentation, operating-system performance monitoring, VMM performance monitoring, and other types of performance-related data.
Both computing hardware and software have evolved at rapid rates over the past 50 years. Early computer systems generally included a single, relatively slow Von Neumann-type processor implemented from many different discrete electronic components distributed across multiple printed circuit boards. Currently, even modestly priced personal computers (“PCs”) contain extremely complex, fast, and powerful multi-core processors with simultaneous multi-threading cores (“SMT”) integrated within single integrated-circuits. While early computers had primitive control programs that provided rudimentary execution environments for executing single programs, from start to finish, in batch-mode processing, even relatively low-end workstations and servers may currently feature virtual machine monitors to provide execution environments for multiple guest operating systems, each, in turn, providing execution environments for concurrent or simultaneous execution of large numbers of multi-threaded processes. These complex hardware and software computing systems represent numerous performance monitoring and accounting challenges that current performance-monitoring hardware features of modern processors do not fully address.