This invention relates to the monitoring and control of concurrent processes in a multiprocessing, multiprogramming computing environment, and more particularly, to detection and monitoring of resource contention between multiple processes thereof.
As used herein, the term xe2x80x9ccomputing environmentxe2x80x9d includes any single system or multi-system computing environment as available or known in the art. A xe2x80x9ctaskxe2x80x9d or xe2x80x9cprocessxe2x80x9d means an independent unit of work that can complete for the xe2x80x9cresourcesxe2x80x9d of a computing environment. A xe2x80x9ctask control blockxe2x80x9d is a consolidation of control information pertaining to a task including any user-assigned priority and its state, i.e., active or waiting. The xe2x80x9cwait statexe2x80x9d is a condition of a task that is dependent upon the execution of other tasks in order for the xe2x80x9cwaitingxe2x80x9d task to become xe2x80x9cactivexe2x80x9d.
Also in this specification, a xe2x80x9cresourcexe2x80x9d is any facility of a computing environment or of an xe2x80x9coperating systemxe2x80x9d running thereon which is required for the execution of a task. Typical resources include main store, input/output devices, the central processing unit (CPU), data sets, and control or processing programs. In this regard, an xe2x80x9coperating systemxe2x80x9d is a set of supervisory routines running on a computing system for providing, for example, one or more of the following functions: determining the order in which requesting tasks or their computations will be carried out, providing long-term storage of data sets including programs, protecting data sets from unauthorized access or usage, and/or system logging and recovery.
xe2x80x9cMultiprogrammingxe2x80x9d which pertains to the concurrent execution of two or more programs by a computing environment, can be managed on a computer running under, for example, OS/390 offered by International Business Machines Corporation. Modern operating systems, by permitting more than one task to be performed concurrently, make possible more efficient use of resources. For example, if a program that is being executed to accomplish a task must be delayed (for instance, until more data is read into the CPU), then performance of some other completely independent task may proceed. The CPU can execute another program or even execute the same program so as to satisfy another task.
In today""s computing environments, mutual exclusion (or resource serialization) is often provided within the operating system itself. With IBM""s OS/390 system, a customer has the option of configuring a multi-image environment to increase capacity and enhance availability. To allow these images to co-exist, resources shared between systems need to be serialized to ensure integrity. OS/390 uses a Global Resource Serialization (GRS) component to serialize both single system and multi-system resources. These resources can number in the thousands, if not millions. For more information on GRS reference an IBM publication entitled xe2x80x9cOS/390 MVS Planning: Global Resource Serializationxe2x80x9d; doc. #GC28-1759-OS (September, 1998) (6th edition), the entirety of which is hereby incorporated herein by reference.
In the allocation and use of these resources, contention for a resource can occasionally cause progress of the workload to be negatively impacted for a number of reasons. For example: (1) a resource allocation deadlock might occur; (2) a long-running task might hold a resource (resource starvation); or (3) a task holding resources may have ceased to respond (xe2x80x9cenabled hangxe2x80x9d).
A task is said to be xe2x80x9cdeadlockedxe2x80x9d if its progress is blocked indefinitely because it is stuck in a xe2x80x9ccircular waitxe2x80x9d upon other tasks. In this circumstance, each task is holding a xe2x80x9cnon-preemptablexe2x80x9d resource which must be acquired by some other task in order to proceed, i.e., each task in the circle is waiting upon some other task to release its claim on a resource. The characteristics of deadlock then are mutual exclusion, non-preemption, and resource waiting. In the case of resource starvation, a long-running task or job holds one or more critical resources, in which case, workload also requiring that resource(s) must wait until the job ends. In severe cases, software errors can cause tasks that hold resources to fail without ending, causing the resource to be permanently held, thereby blocking workload that requires the task.
In view of the above, resource contention monitoring and analysis can be significant functions in today""s computing environments.
In certain systems, resource serialization managers have an ability to report on resource contention, and document blocking requests and waiting requests for resources However, such systems do not provide for any intelligent ordering of the assembled information. For example, the current GRS implementation assembles the contended resources in alphabetical order of resource name. Thus, provided herein is an enhanced approach wherein blocking requests and waiting requests are explicitly listed in a time-based manner.
Briefly summarized then, this invention comprises a method for analyzing resource contention in a computing environment. This method includes: selecting a current waiting request for a resource; using a resource queue for the resource, chaining to a current top blocker request for the resource; chaining to a task related waiter queue (TRWQ) for the current top blocker request, wherein any requests waiting for a computer environment resource are listed in a first-in/first-out manner; and searching the TRWQ for any waiting request made by a task generating the current top blocker request, and if there are no waiting requests associated with the current top blocker, dependency analysis is complete.
In a further aspect, a method for analyzing contention in a computing environment is provided. This method includes identifying at least one of a longest blocking process or a longest waiting process for a resource of the computing environment; and wherein the identifying comprises examining one of a blocking queue or a waiting queue for the resource, wherein the blocking queue comprises a time-ordered listing of all currently blocking processes requesting the resource, and wherein the waiting queue comprises a time-ordered listing of all currently waiting processes requesting the resource.
Systems and computer program products corresponding to the above-summarized methods are also described and claimed herein.
To restate, provided herein is an enhanced resource contention analysis technique which provides an ability to readily report on: (1) tasks (and resources) that have been blocking requests for the longest period of time; (2) tasks (and resources) that have been waiting for the longest period of time; and (3) tasks (and resources) involved in a request dependency chain, and whether or not that chain represents a deadlock. With the information provided by the enhanced contention analysis disclosed herein, an installation can determine if a high volume of contention is actually a problem or not. If the contention is a problem, then the tasks involved in that contention are apparent, allowing the installation to take action against a task, subsystem, or system, to alleviate the problem. With the current art, a customer would have to take the output from multiple instances of the contention display to determine whether or not the systems are making progress and then by hand build the dependency graph, and determine which resources and tasks are at fault. Obviously, the problem is nearly insolvable when hundreds of resources and tasks are involved in contention.
As noted, the blocker and waiter lists disclosed herein comprise lists sorted by the time of the event (i.e., the longest blocker or waiter is at the head of the list). The advantages of this approach are that:
(1) Finding the most effected resource/request is simplified. The element at the front of the list is the request that has been blocking/waiting (depending on the list) for the longest period of time. This means that an analysis of the resources does not have to query the state of all resources. Generally, only a very small number of resource s ( less than  less than 1%) are in contention at any one time.
(2) Deadlock analysis is simplified. Without maintaining a separate list of requests in contention, a complete search of the resource requests would be required to determine if a blocking request is, in turn, blocked by another, associated resource request. With the list, it is simple to interrogate a blocked request, go to the front of the resource request list containing the request to find the blocker, then go up to that blocker""s unit of work waiter list to see if that unit of work is blocked. This reduces the search time, since rather than interrogating every request from a task, the existence of an element indicates the oldest waiting request from a particular task.