Performance of modern computer systems, including networked computer servers, may degrade for a variety of reasons, many of which relate to the use of shared resources including disk bandwidth, memory capacity, and central processing unit (CPU) utilization. Information technology (IT) system administrators track performance of their computer systems to ensure optimum allocation of these and other shared resources. Performance monitoring software provides system administrators with the tools necessary to track system performance and to diagnose problems. The performance monitoring software may provide immediate performance information about a computer system, allow the administrator to examine computer system activities, identify and resolve bottlenecks, and tune the computer system for more efficient operation. The performance monitoring software may keep a history of the computer system performance, monitoring performance as a background task, and may send alarms for impending performance problems. Using the performance monitoring software, the administrator can pinpoint trends in computer system activities, and can use this information to balance workloads to accurately plan for computer system growth.
In order to examine performance, the performance monitoring software must first collect performance information. This performance information, or instrumentation, may be provided by the Operating System, software probes, or applications. Metrics derived from this instrumentation may be organized in several different ways, including by resource, or from a global level down to an application level (groups of processes), then to a process or individual thread level. Metrics derived by performance monitoring software can include CPU and memory utilization, time spent waiting for different system resources, queue lengths, application-specific table and status information, and application response time. These metrics may be used by the administrator to tune the system for optimal performance, and the performance monitoring software may generate alerts and warnings whenever a threshold value is approached or exceeded. The thresholds may be adjustable, and the alerts and warnings may be provided by an e-mail message, for example.
Computer systems provide services to their users and to other services in a computing environment. The goal of tuning is to optimize the services which reside on a particular system. Services (such as a data repository or internet web service) may be composed of one or more specific applications, which are in turn composed of one or more processes instantiated on the computer system. Users of the computer system observe the behavior in terms of the service they access, whereas internally the computer system differentiates performance more in terms of specific resources, processes, and applications.
Unfortunately, current computing services do not have a consistent way to report their status to the tools that monitor performance. Each service, or its constituent applications and processes, may have internal status measures and instrumentation that would be useful to the performance monitoring software. However, there is no consistency in the way in which performance instrumentation is made available or reported. Furthermore, most applications do not generate their own performance information. Finally, services rarely receive “external” information related to the complex computer environment in which they operate. Bottlenecks external to the service itself, such as network bandwidth and dependent service shortfalls, may affect service health and responsiveness, yet the potential external bottlenecks are not monitored and managed in a cohesive way. As a result, the health of a service often cannot be managed, or even characterized and reported. Most often, service status is only characterized as “up” or “down.” In evolving complex computer systems, a consistent method is required to analyze service health to a more robust level of detail. Greater granularity of service status would enable construction of useful service goals and management of service level objectives in order to achieve greater consistency of performance and availability.
Yet another problem with service health performance monitoring is that the rules related to data collection and analysis of performance instrumentation are constantly changing. Every time a new version of an application is introduced into an environment, the performance information related to the application may change. Likewise, the performance monitoring software itself can change. These changes may result in new ways to access instrumentation or process it, and introduce new data sources, for example. The system administrator must constantly adapt the way performance monitoring is configured because of the built-in dependency among the different layers of the applications, instrumentation, the performance monitoring software, and the computer operating system environment. Any change in the environment often mandates change in performance monitoring.