1. Field of the Invention
The present invention relates to techniques for improving the performance of computer systems. More specifically, the present invention relates to a method and an apparatus for proactively identifying runaway processes in computer systems.
2. Related Art
System administrators for enterprise-wide computer systems typically handle dozens (or even hundreds) of heterogeneous computer systems that service thousands of end users. Hence, a system administrator usually deals with an extremely large volume of system information, making it almost impossible for the administrator to manually detect precursors for system performance degradation. Consequently, problems in an enterprise computer system are typically detected only after they have already caused a significant amount of performance degradation.
One of the main causes of performance degradation are runaway processes. A runaway process is a process that no longer provides service to the user who initiated it, but continues to use system resources. For example, a runaway process can be caused by an application which crashes or is not properly terminated. In such situations, the user often has no knowledge of the continued existence of the runaway process, and the runaway process ends up using resources until someone manually identifies and terminates it or the machine is rebooted.
Runaway processes can cause dramatic performance degradation in enterprise computer systems. Even runaway processes which are not large enough to completely shut down a server can cause significant problems, particularly when there are multiple runaway processes running on a single server. For example, a small runaway process using 5% of the server's resources only marginally affects the operation of the server. However, five instances of these small runaway processes can consume 25% of the server's resources, which can seriously degrade the performance of the server.
Runaway processes are often hard to detect because superficially, their appearance is indistinguishable from a normal process. Hence, a system administrator typically needs to spend a considerable amount of time to carefully examine several process parameters before concluding that a process is a runaway process. Unfortunately, system administrators are usually hard-pressed for time. As a result, runaway processes typically run unimpeded, until they create serious performance problems.
Hence, what is needed is an accurate runaway-process-detection mechanism to assist system administrators in identifying runaway processes before they significantly degrade system performance.