Transaction processing systems, such as enterprise class computer systems and e-commerce servers, require monitoring and analysis in order to ensure efficient utilisation of hardware resources. That is, it is desirable to maximise the number of transactions processed by a computing system within a given time.
Generally, in monitoring and analysing the hardware and software resource usage of a computing system, a system monitor will generally observe and record characteristics of the transaction load and other characteristics of system behaviour. The data gathered by the monitor is used by users (such as system administrators) to identify problem areas and reduce performance bottlenecks.
For example, a system administrator will generally attempt to balance system load flow between system elements, by, for example, switching off less essential services to provide more resources to critical services.
In order to make an informed decision on how to balance load or change the operating parameters of a computing system, a system administrator will generally be provided with a large number of characteristics that are monitored by the computing system. These characteristics are generally monitored by “counters”, which are generally software modules which collect statistics on the performance of various hardware and software sub-systems within a computing system.
A typical server will have over a thousand counters, each counter describing a different aspect of system behaviour. The counters may include characteristics such as processor (CPU) utilisation, interrupt rate, memory usage, number of disk reads within a given time, and number of disk writes within a given time.
In the art, the abovementioned counters are commonly split into two general “types”.
The first type of counter is utilised for system monitoring. These counters are generally associated with on-line display of counter values. The Windows™ operating system performance monitor “perfmon” and the “sar” software package on Unix™ operating system are examples of software packages that monitor the first type of counters.
The second counter type is generally employed for system analysis. That is, these counters are generally employed off-line, for analysis of daily and weekly patterns of load, response time, and gauging the effect of hardware and/or software upgrades.
Whilst these two types of counters utilise different methodologies, they attempt to achieve the same aim, namely to provide an indicator of how computer resources are utilised within a computing system.
Existing tools provide no mechanism to organise the large number of counters (“characteristics”) present in contemporary computer systems. Existing tools are capable of displaying any required characteristic, but do not offer any guidance to the system administrator as to which characteristics are important. That is, the system administrator has to specify which characteristics they wish to monitor and/or analyse.
Traditionally, characteristics selected for monitoring/analysis are chosen on the basis of whether they are “thought” to be important. For example, it is generally held by persons skilled in the art that the daily average CPU utilisation and the daily average throughput are important characteristics that should be monitored closely.
By employing such a methodology, hundreds or potentially thousands of other characteristics are ignored, primarily because it is too time consuming to monitor or analyse every system characteristic. In order to ameliorate this problem, some contemporary monitoring tools allow the user to set an alarm for a particular counter. The alarm will alert the system administrator when the value of the counter passes a predetermined value. This approach provides some indication of which characteristics should be displayed and/or analysed, but still requires system administrators to manually configure the alarm levels. As there are potentially thousands of separate counters, many system administrators will not set alarm levels for each characteristic. Therefore, this system of providing alarm levels does not satisfactorily solve the problem.
In addition, some contemporary monitoring tools allow for two selected characteristics to be plotted against each other. Once again, the system administrator is required to select which characteristics they wish to view. Thus, this feature does not ameliorate the problem of determing which characteristics are important to a computing system.
Similarly, it is difficult for the producers of monitoring tools to predict and pre-select system characteristics which will be of particular importance on a defined computing system. For some computing systems, the important characteristic may be processor time, for some it may be disk access time.
In addition, different computing systems will have different daily usage profiles and application mixes, so each computing system will require individual customisation. However, during the installation and customisation phase, it is easy to accidentally omit characteristics which are important for a given installation. For example, the number of context switches per second, a counter which is rarely examined, may be important for a particular computing system.
In other words, the problem of determining which characteristics are important to a particular computing system is circular. The user is required to know which counters should be included to adequately analyse the system, yet to analyse the system, the correct counters must be specified to ensure adequate data collection.
Moreover, despite careful initial analysis, the dynamic nature of system load frequently results in a situation where new characteristics become important only during certain periods of time.
There is a need to provide a system or method which assists the system administrator in determining which system performance characteristics are important for a given computing system.