This invention generally relates to computer system performance monitoring. More specifically, the invention relates to sampling computer system performance data without impacting reliability, availability and serviceability of the computer system.
Computer system performance measurement enables detection of issues that can result in reduced throughput of the computer system. One approach to measuring performance is to repeatedly execute workload instruction streams, which are often segments of customer workload code targeted to stress particular hardware and/or software functions, and collect data relevant to the system's performance. Initially, hardware captures selected signals and stores them in hardware arrays for further analysis. Each group of the selected signals is called a “sample”. When enough samples have been captured to fill the arrays, a hardware interrupt invokes firmware to move the data from the arrays to storage. A set of controls provides flexibility for a user (e.g., a measurement team member) in selecting which signals are captured and when the selected data is captured. The captured data are later used for calculating performance analysis metrics such as cycles per instruction (CPI), cache misses/hits, pipeline stalls, and the like. Basic mechanisms for data capturing and performance measurement, also referred to as “instrumentation”, are described in U.S. Pat. Nos. 4,590,550, and 4,821,178, each of which is hereby incorporated herein by reference in its entirety.
Historically, to reduce hardware footprint, instrumentation has taken advantage of hardware arrays already existing in a design. These arrays were originally intended for hardware tracing to capture machine states over a period of time for debug data. When a failure occurs, the data in the arrays, once extracted, serve as a record of events leading up to the failure. Along with providing debug data in a lab environment, hardware tracing can greatly enhance computer system serviceability. In the event of a failure in the field (e.g., customer location), this capability facilitates problem isolation and resolution. Further, once a problem is understood, design changes can be implemented to improve future reliability. Hardware controls, such as multiplexers, are provided to allow the user to select which signals are routed to the hardware arrays. Several hardware tracing modes may be defined to assist in debugging particular scenarios, and the multiplexers provide switching between the modes. A further mode is defined to facilitate instrumentation. Different sets of signals are routed, via the multiplexers, to the hardware arrays for hardware tracing and instrumentation. The instrumentation signals can be used for evaluating system performance.
Additional controls are provided to define events, upon which to capture data. Typical settings for collecting debug data using hardware tracing include starting on an instruction address. For instrumentation, it is desirable to start collecting data on a time increment and to capture a set of data on regular time intervals. During a typical instrumentation run, the multiplexers are set to route instrumentation signals to the hardware arrays, and the event controls are set to collect data on a time interval. Each time the interval expires, the instrumentation signals are captured and saved in the next available row within the hardware arrays. When all rows of the arrays have been filled, an interrupt invokes firmware to copy the data to a buffer in storage. Upon exiting the firmware routine, the run continues.
A major drawback to this existing approach is that running instrumentation entails switching the hardware controls into instrumentation mode, thereby disabling hardware tracing. Without hardware tracing enabled, failure analysis is extremely difficult, and reliability/availability/serviceability (RAS) is compromised. Thus, instrumentation is not typically run in the field at a customer site due to the resulting reduction in RAS.
It would be beneficial to allow instrumentation data to be captured while simultaneously running hardware tracing. To keep complexity and hardware costs to a minimum, it would be desirable to support instrumentation and hardware tracing without duplicating the entire collection of hardware currently used for hardware tracing. Additionally, it would be advantageous to allow sampling of instrumentation data in the field without reducing RAS. Accordingly, there is a need in the art for sampling computer system performance data without impacting RAS of the computer system.