This invention relates to the control of concurrent processes in a multiprocessing, multiprogramming CPU environment, and more particularly, to the detection of deadlocks among waiting tasks thereof.
As used in this specification, the term "computing system" includes a CPU with main store, input/output channel control units, direct access storage devices, and other I/O devices coupled thereto such as described in G. M. Amdahl, et al, U.S. Pat. No. 3,400,371, issued Sept. 3, 1968 and entitled, "Data Processing System". A "task" is taken to mean an independent unit of work that can compete for the "resources" of a computing system. A "task control block" is a consolidation of control information pertaining to a task including any user assigned priority and its state i.e. active or waiting. The "wait state" is a condition of a task that is dependent upon the execution of other tasks in order for said "waiting" task to become "active".
Also, in this specification, a "resource" is any facility of a computing system or of an "operating system" running thereon which is required for the execution of a task. Typical resources include main store, I/O devices, the CPU, data sets, and control or processing programs. In this regard, an "operating system" consists of a set of supervisory routines running on a computing system for providing at least one of the following functions: determining the order in which requesting tasks or their computations will be carried out, providing long term storage of data sets including programs, protecting said data sets from unauthorized access or usage, and/or system logging and recovery.
"Multiprogramming", which pertains to the concurrent execution of two or more programs by a computing system can be managed on a computer running under IBM System/360 Operating System as described in IBM Publication GC28-6646, July 1973 and listed in IBM System/360 Bibliography GA22-6822. Relatedly, such modern operating systems, by permitting more than one task to be performed concurrently, make possible more efficient use of resources. If a program that is being executed to accomplish a task must be delayed, for example, until more data is read into the CPU, then performance of some other completely independent task can proceed. The CPU can execute another program or even execute the same program so as to satisfy another task.
In the competition for serially reusable resources, a task is said to be "deadlocked" if its progress is blocked indefinitely because it is stuck in a "circular wait" upon other tasks. In this circumstance, each task is holding a "non-preemptible" resource which must be acquired by some other task in order to proceed i.e. each task in the circle is waiting upon some other task to release its claim on a resource. The characteristics of deadlock then are mutual exclusion, non-preemption, and resource waiting. Mutual exclusivity implies that a task claims exclusive control over the resources it uses. Non-preemption connotes that a task does not release resources it holds until it completes use of them. Lately, resource waiting occurs because each task holds resources while waiting for others to release resources.
There are, as pointed out by Coffman and Denning, "Operating Systems Theory", 1973, Prentice Hall, at page 46, several approaches to dealing with deadlocks. These approaches may respectively involve prevention, detection and recovery, or avoidance. This is particularized by A. C. Shaw, "The Logical Design of Operating Systems", 1974, Prentice Hall at pages 227-232 and 215-224. Shaw observes that the general approach of the art to deadlock prevention is to restrict the system such as by permitting only one task at a time to utilize resources. However, to permit multiprogramming operation, a more practical restriction would be to require each task to name its resources only at its creation. Tasks with allocated resources then would never be blocked because they cannot reference other resources and eventually will release them to the resource pool. The disadvantages of this deadlock prevention/avoidance policy are that it presupposes that the extent and order of resource use can be completely specified beforehand and that the resources are tied up for unnecessarily long times. For instance, a task may specify resources in the order a, b, c when the order due may be b, c, a. Also, resource c may be used only in the last portion of time alloted to the task. Some of the disadvantages have been overcome by J. W. Havender, "Avoiding Deadlock in Multitasking Systems", 1968, IBM Systems Journal 74-84 by an "ordered" resource policy.
Both Coffman and Shaw emphasize the role of "detection" in the success of deadlock resolution in multiprogramming computer systems. Indeed, Coffman in a subsequent discussion, "System Deadlocks", June 1971, Computing Surveys at pages 67-78 considers wait relations among tasks including those in which tasks may directly wait on two or more other tasks. In the general case, Coffman's execution time for deadlock detection among N tasks is proportional to N.sup.2. Also, in Coffman's modified version, the detection time is a linear function of N+(R log R), where (R log R) is the time required to sort R resources. Lastly, the Coffman method requires at detection time the knowledge of resources in addition to data concerning task identity and their wait relations.
Obermarck in the IBM Technical Disclosure Bulletin, Vol. 12 at pages 2338-2339 in 1971 described the use of a matrix in which the row and column coincidence between a requestor and resource owner, together with the table entry, determine deadlock. Also, P. Roever in Vol. 16 of the IBM Technical Disclosure Bulletin at pages 1243-1244 in 1973 utilized a matrix only where the tasks were waiting in a circle A.fwdarw.B.fwdarw.C.fwdarw.D.fwdarw.A. In this latter case, the method involved the step of detecting a submatrix having nonzero rows and columns.