Operating systems associated with computers sometimes fail to deliver the expected level of performance. There are many reasons for the performance level not being at the expected level. Some reasons for the performance problems are changes in workload, the occurrence of hardware or software errors, under-configuration (e.g., too little memory), configuration errors, lack of tuning, and over-commitment of resources. Addressing these problems requires first that they be detected (or anticipated, if possible); then that the reasons for their occurrence be identified; next, steps are taken to remedy the situation; and finally, the fix must be verified.
Detection of performance problems is difficult. The relevant data is not centrally available in many computer systems such as UNIX systems or the like. Further, the interpretation of that data often requires expertise not commonly associated with system administration. However, identifying performance problems is important, for their presence diminishes a customer's investment in a computer system by robbing the customer of purchased resources.
Performance and configuration data are required to effectively diagnose the performance of the computer system. This data is typically available on a computer system from a large number of sources. Typically in a UNIX computer system, the data is provided in many different types of formats.
In order to obtain this diagnostic information, the diagnostic system must be able to collect the data from the resource manager regardless of the format. Accordingly, additional complexity must be build into each of the data sources to provide for integrated (seamless) access to each of the different formats that could be provided. This additional complexity can considerably increase the cost when providing performance diagnosis and consequently, the overall system.
A common type of parameter in operating system control is a simple threshold parameter. Typically, such a threshold defines an absolute limit for some process, user's resource consumption or overall system. It is often the case, however, that no statistics are maintained on the number of times that processes (or users, or whatever resource consumer) reached that limit.
Knowledge of these limits, and their effects on workloads, is important for several reasons. The threshold might be set too low; that is, there might be sufficient resources on the system to support a higher level of use--but the threshold is exerting an unnecessary constraint on resource consumption. The threshold might not be having any effect at all (that is, it might be set too high). With no statistics maintained on (attempted) threshold violations, it is difficult to determine whether or not the threshold has an appropriate setting.
Accordingly, what is needed is a method and system that detects whether a particular threshold setting of a computer system is appropriately set. The process should be such that it does not add significant cost and complexity to the operation of the computer system. The present invention addresses such a need.