Embodiments of the invention are directed to an approach for determining whether a computing system/application is in an unresponsive “hang” state and for distinguishing a hang state from an idle state.
Computing systems are generally used to process work on the behalf of users or other resource consumers. Work requests are issued by the users and consumers of the system, which are sent to processing entities that handle execution and processing of the work requests. Such processing entities include, for example, processes, threads, tasks, nodes, and various types of distributed entities. For the purposes of explanation, such processing entities will be referred herein, without limitation, as “processes.”
Any type of work may be suitably performed by processes within the computing system. As just one common example, the computing system may be utilized to perform work relating to database processing. One or more users may desire to query data within a database system, where the query processing work is sent for processing by one or more processes at a server running a database management system.
Various resources may be consumed or allocated during the process of performing work in a computing system. Examples of hardware resources that may be consumed or allocated include the CPU (central processing unit), networking resources, I/O (input/output) resources, memory, and persistent storage space. Examples of system and application resources include database objects, locks, and processes. These resources are often allocated based upon requests and actions taken by the processes to perform the work requested by users.
The performance and responsiveness of the computing system often depends upon the availability of sufficient resources to handle the work and of the general level of operating health for resources within the system. If there are sufficient available resources and if there are not otherwise any operating problems with the system, then the user requests should be processed in a timely manner. If, however, there are insufficient resources or if the system experiences operating problems, then it is quite possible that user requests are not handled in a timely manner. In this circumstance, the operating problems or resource insufficiencies may need to be addressed before further work can be adequately performed in the system.
However, the perception of low activity in handling user requests does not necessarily mean that there is a system-related problem that needs to be corrected. In some cases, it is possible that there is no system problem at all, even though there is little ongoing processing activity. For example, periods of lower activity may be merely a symptom of a system that is in an idle state, where the system is perfectly capable of processing work once the work is actually requested by users.
Some embodiments of the present invention provide approaches for distinguishing between a computing system that is in a hang state and a system that is in an idle or otherwise non-hang state and which does not need intervention before regaining the ability to adequately process work. According to some embodiments, heuristics are employed to perform hung and idle system detection and differentiation. Data representative of systems resources are analyzed and transformed in order to identify systems that are in a hang state.
Other and additional objects, features, and advantages of the invention are described in the detailed description, figures, and claims.