1. Field of the Invention
The present invention relates to operating systems for computers. More specifically, the present invention relates to a method and an apparatus for rapidly detecting cycles in a set of dependencies between resources in a computer system.
2. Related Art
Computer systems often support large numbers of resources that work together in performing computational tasks. This is particularly true for distributed computing systems in which numerous resources are distributed across multiple computing nodes.
Resources often depend on other resources, which can complicate the task of coordinating the resources so that they can work together. For example, an enterprise computing application may depend on an underlying database application in order to operate. Hence, if the underlying database application is unable to operate, the enterprise computing application is unable to operate.
Note that dependencies can be chained together. For example, the underlying database application may itself depend on storage system in order to operate. Hence, if the storage system is unable to operate, the underlying database application is unable to operate and the enterprise computing system application is unable to operate.
A deadlock condition can occur if a cycle of dependencies exists between applications in the computer system. For example, if the above-described storage system requires parameters from the enterprise computing application in order to operate, a cycle of dependencies exists between the storage system, the underlying database application and the enterprise computing application. Hence, a deadlock condition exists and none of the resources is able to operate.
Deadlock can be avoided (or at least diagnosed) by running a cycle detection algorithm on dependencies between resources in the computer system. A cycle detection algorithm typically operates on a directed graph in which resources are represented by nodes and dependencies are represented as directed edges (arcs) between the nodes.
Unfortunately, the process of detecting cycles can be very time-consuming in systems containing large numbers of resources and large numbers of dependencies. Hence, the cycle detection process can cause significant delays.
What is needed is a method and an apparatus that rapidly detects cycles of dependencies between resources in a computer system without incurring a large amount of computational overhead.
One embodiment of the present invention provides a system that detects cycles in a set of dependencies between a set of resources in a computer system. The system operates by receiving a new dependency indicating that a first resource cannot proceed unless a second resource is able to proceed. The system determines if the new dependency creates a cycle in the set of dependencies by performing a search, which looks for cycles of dependencies starting from the first resource and ending at the first resource. If the search detects such a cycle, the system indicates that the new dependency creates the cycle. The system may also send an error message when the cycle is detected, and may abort further processing.
In a variation on this embodiment, performing the search involves initializing a list to include the first resource. The system then cycles through all resources in the list. It sets a current resource to be the head of the list, and then marks the current resource as xe2x80x9cvisited.xe2x80x9d Next, the system examines each resource that the current resource depends on. If the resource is the first resource, the system indicates that a cycle is created. Otherwise, if the resource is not already in the list or has not been marked as visited, the system adds the resource to the list. At some time during this process, the system also removes the current resource from the list.
In one embodiment of the present invention, the computer system is a distributed computing system, and the resources are distributed across computing nodes in the distributed computing system.
In one embodiment of the present invention, the set of resources can include applications, devices and processes within the computer system.
In one embodiment of the present invention, the method is performed by a resource manager that starts and stops resources in the computer system.
In one embodiment of the present invention, the set of resources is divided into a plurality of resource groups, wherein each resource group includes a plurality of related resources.
In one embodiment of the present invention, performing the search includes looking for cycles of dependencies between resources within each resource group and looking for cycles of dependencies between resource groups.
In one embodiment of the present invention, the new dependency between the first resource and the second resource indicates that the second resource must be started before the first resource is started.
In one embodiment of the present invention, the new dependency between the first resource and the second resource indicates that the first resource must be stopped before the second resource is stopped.