Modern processing systems are complex and difficult to design. One of the more challenging aspects of system design relies on an analysis of the performance of the systems, both for speed and reliability. One important performance analysis tool is a hardware-collected trace. Typically, traces provide data used to simulate system performance, to make hardware design tradeoffs, to tune software, and to characterize workloads. Hardware traces are virtually independent of operating system, application, and workload. This attribute makes these traces especially well suited for characterizing the On-Demand and Virtual-Server-Hosting environments now supported on recent servers.
For example, a symmetric multiprocessing (SMP) data processing server has multiple processors with multiple symmetric cores, such that each processor has more or less the same processing speed and latency. An SMP system could have multiple operating systems running on different processors (a “logically partitioned” system), or multiple operating systems running on the same processors one at a time (a “virtual server” hosting environment). Generally, operating systems divide processing work into tasks that can be distributed among the various cores by dispatching one or more software threads of work to each processor. Multiple operating system environments complicates hardware trace operations and performance analysis.
Among SMP systems, thread handling further complicates performance analysis. For example, a single-thread SMP system includes multiple cores that can execute only one thread at a time. A simultaneous multi-threading (SMT) SMP system includes multiple cores that can each concurrently execute more than one thread at a time per processor. SMT systems can also favor one thread over another when both threads are running on the same processor. As such, many designers use a hardware trace facility to capture various hardware signatures within a processor as trace data for analysis. This trace data may be collected from events occurring on processor cores, busses, caches, or other processing units included within the processor. Most typical hardware trace facilities collect hardware traces from a trace source within the processor and then store the traces in a predefined memory location.
In the case of a multi-processor server, however, there are often many processes running across the multiple processing cores, complicating performance analysis generally and hardware trace operations specifically. Particularly in multi-core, virtualized computer systems, many processor threads can generate cache misses simultaneously, therefore also generating a relatively high volume of system bus traffic to unrelated real addresses in the main memory. Current hardware trace systems cannot adequately trace these multiple bus transactions or capture the relevant trace data in a form useful to the design and test engineers.
Therefore, there is a need for a system and/or method for hardware process tracing that addresses at least some of the problems and disadvantages associated with conventional systems and methods.