1. Technical Field of the Invention
The present invention relates in general to the field of network systems, and in particular, by way of example, and not limitation, to detection of a deadlocked resource condition in a pool of shared resources.
2. Description of Related Art
In a typical network system, such as a telecommunication system or data network system, the network maintains centralized control over available resources, and allocates these resources to individual devices or processes in response to a resource request. This centralized control enables the network to share the available resources among a larger number of users, thereby allowing network operators to offer a greater level of service at a lower cost. The sharing of resources among potentially competing devices or processes, however, causes network systems to be susceptible to resource conflict. This problem arises when two or more devices or processes attempt to access the same resource during the same time interval. These potentially conflicting uses of shared resources require the network system to maintain an established protocol that enables either the network or individual devices or processes to determine when access to a given resource is permissible.
Resource locking is one technique commonly employed in network systems to avoid resource conflicts. This technique utilizes a protocol in which the network signals the use of a resource by marking the resource as xe2x80x9cusedxe2x80x9d or xe2x80x9cbusy.xe2x80x9d This notifies the network and/or competing processes that the resource cannot be accessed at that time. In operation, a process desiring access to a particular resource inspects the lock or flag associated with that particular resource before submitting a resource request. Alternatively, the network inspects the flag associated with the resource before allocating the resource to the particular process. If the flag indicates that the resource is currently xe2x80x9cfreexe2x80x9d or xe2x80x9cidle,xe2x80x9d the process is allocated the resource, uses the resource, and then releases the resource when the process is finished. On the other hand, if the flag indicates that the resource is xe2x80x9cusedxe2x80x9d or xe2x80x9cbusy,xe2x80x9d the process and/or network waits for the resource to be released by the prior acquiring process before allowing access to the resource.
If the network system configures the available resources into one or more pools of similar or identical resources, the resource locking technique may employ a xe2x80x9cseizurexe2x80x9d mechanism that attempts to seize a free (available) instance of a resource from the resource pool and allocate the seized resource to the requesting device or process. This seizure mechanism typically utilizes a search strategy or a linked list approach to identify free instances of the requested type of resource. If a search strategy is used, for example, the pool of the appropriate type of resource is examined in a specified order until a free resource is found and seized or the search is abandoned because it is determined that there are no free resources within the resource pool. If the linked-list approach is used, on the other hand, the first item of the linked-list (if there is an item in the linked-list) represents an available instance of the resource. This resource may then be seized by the seizure mechanism by changing the state of the found resource from idle to busy. Regardless of the type of seizure approach used, the end result will be either identification of an available resource within the resource pool or a report back to the device or process originally requesting the resource that xe2x80x9ccongestionxe2x80x9d has occurred and, thus, no free resources of the type requested are currently available. If an instance of the resource is found and seized by the seizure mechanism, the seized resource is effectively locked or otherwise prevented from being seized again until the seized resource is freed by the reverse of the seizure mechanism.
One undesirable situation that can arise in a resource locking scheme is a xe2x80x9cdeadlocked resource condition.xe2x80x9d For the purposes of the present invention, a deadlocked resource condition occurs when a device or process that has been allocated a shared resource fails to release the resource for illegitimate reasons, such as an error in communications, an error in system or application software, or an undesired operational state. For example, an error in the system or application software may cause the process or network to fail to release the resource when the process is complete, or the signal releasing the shared resource may not be received due to an error in communications. Moreover, the process to which the resource is allocated may encounter an unexpected error and self-terminate, or the process may be terminated (xe2x80x9ckilledxe2x80x9d) by the operating system or another process without releasing the resource. This situation is referred to as process death, and may cause the shared resource to remain locked by the dead process potentially forever.
A deadlocked resource condition may also occur due to conventional deadlock between competing processes. This conventional deadlock situation arises when a first process attempts to acquire a resource that is already locked by a second process, and the second process likewise attempts to acquire a resource that is already locked by the first process. Since neither process is able to release the lock sought by the other, neither process cane proceed. Both resources will remain locked until one of the conventionally deadlocked processes is terminated to allow the other process to continue.
If a deadlocked resource condition is not detected and corrected, and if the process or processes continue to cause deadlocked resource conditions due to a reoccurring error or reoccurring operational state, all available resources will be gradually consumed. This situation leads to a gradual degradation in the quality of service offered by the network, until the network eventually ceases to function. Therefore, in a network system which employs a resource locking scheme, there must be a mechanism for detecting a deadlocked resource condition in time for network operators or system software to take corrective action, such as temporarily assigning additional resources to allow the network some ability to function, restarting the network to clear deadlocked resource conditions, and actually fixing the problem causing the deadlocked resource condition.
Existing approaches have attempted to alleviate the problems described above by utilizing either a timer-based or logic-based solution or by monitoring traffic congestion. In a timer-based solution, a separate timer is initiated for each resource allocation. If the resource remains locked after a predetermined amount of time has expired, the network assumes that a deadlocked resource condition has occurred, terminates the process, and releases the resource. Although a timer-based solution has the potential to correctly detect deadlocked resource conditions in situations involving a definite upper time limit of resource usage, this solution proves to be inadequate in situations where the upper time limit is relatively long or indefinite. For example, because users will only (normally) tolerate a call setup that is measured in seconds and is well under a minute in maximum duration, the timer-based solution may perform well in situations where the resources are allocated for call setup only and are not used once the call has been completely established. The call itself, however, can last indefinitely, and multi-day long calls are not impossible. Thus, in order to avoid disconnecting a valid, but relatively long, call, a deadlock detection timer must be configured with a time interval that is so long as to be impractical for early detection of deadlock resource conditions. If deadlocked resource conditions were to start occurring frequently, the whole pool of resources may be consumed before the first timer expires. Furthermore, at least one timer must be separately maintained for each seized resource, thus adding to the cost of implementation, maintenance, and the cost of processor time per call.
A logic-based solution, on the other hand, detects conventional deadlock among competing processes by monitoring resources allocated to and requested by each process. Conventional deadlock is detected if a cyclical pattern exists among competing processes. For example, if process A is allocated a first resource and requests a second resource, and process B is allocated the second resource and requests the first resource, the solution will detect the occurrence of conventional deadlock. This logic-based solution, however, is not only complex and difficult to implement in practice, but also fails to detect other causes of a deadlocked resource condition, such as errors in system or application software and errors in communication.
Finally, an approach that monitors traffic congestion initiates an alarm signal when all available resources are utilized. This approach not only fails to warn network operators or system software of a deadlocked resource condition in time to take corrective action, but also fails to resolve the problems mentioned above due to the inability distinguish between a true deadlocked resource condition and a temporarily high traffic load.
Therefore, in light of the deficiencies of existing approaches, there is a need for a detection mechanism that can detect a deadlocked resource condition in a pool of shared resources and that can be easily implemented within existing network systems in a cost effective manner.
The deficiencies of the prior art are overcome by the method and system of the present invention. For example, as heretofore unrecognized, it would be beneficial to exploit known or predictable statistical properties of traffic-based resource utilization to detect a deadlocked resource condition in a pool of shared resources by utilizing known or predictable statistical relationship(s). In fact, it would be beneficial to periodically measure resource utilization over a predefined time interval and compare the measured resource utilization to a predicted resource utilization. A deadlocked resource condition is then determined to have occurred if the measured resource utilization is inconsistent with the predicted resource utilization at a predefined confidence level.
In a first embodiment of the present invention, samples of resource utilization are periodically measured over a predefined time interval to determine a mean resource utilization and a variance of the resource utilization. The determined mean and the determined variance are then compared in accordance with a known or predictable statistical relationship. For example, assuming resource utilization (e.g. offer traffic) of the network approximates a Poisson distribution, the measured variance is compared to k times the measured mean (where k may vary between zero and one depending on the degree of closeness to a Poisson model) If the determined variance is less than k times the determined mean, then a deadlocked resource condition is determined to exist. Conversely, if the determined variance is greater than k times the determined mean, then a deadlocked resource condition is determined not to exist.
In a second embodiment of the present invention, samples of resource utilization are periodically measured over a predefined time interval. The measured samples are then compared to a threshold minimum which comprises, for example, a predetermined value, a fraction of the average of the is measured samples, a fraction of the maximum of the measured samples, or a fraction of the historical average resource utilization. If no measured samples fall below the threshold minimum during the predefined time interval, a deadlocked resource condition is determined to exist.
A third embodiment of the present invention detects a deadlocked resource condition in a resource experiencing an increased traffic load by exploiting the fact that deadlocked resource conditions do not contribute to the variance of the resource utilization. This third embodiment periodically measures resource utilization over a predefined time interval and determines a trend line of the measured resource utilization. This trend line is determined, for example, by performing linear regression analysis or by maintaining a running average over the predefined time interval. If the trend line is positively inclined (indicating an increased traffic load and/or increased deadlocked resource conditions) and there is no increase in the standard error of the samples beyond a predetermined limit, then a deadlocked resource condition is determined to exist.
In a fourth embodiment of the present invention, resource utilization for multiple pools of resources are periodically measured over a predefined time interval. If traffic is increasing within the network, then there will tend to be a corresponding increase in utilization for a set of positively correlated resource pools. The patterns of correlation between resource pools can be determined by experience. If a particular resource pool demonstrates an upward trend in resource utilization during the predefined time interval, the trend is compared with the set of resource pools against which the particular resource pool has historically been shown to be correlated. If the trend of the particular resource pool is greater than the average trend of the set of resource pools, then a deadlocked resource condition is determined to exist.
The technical advantages of the present invention include, but are not limited to, the following exemplary technical advantages. It should be understood that particular embodiments may not involve any, much less all, of the following exemplary technical advantages.
An important technical advantage of the present invention is that it enables network operators or system application software to detect deadlocked resource conditions and thereby take corrective action.
Another important technical advantage of the present invention is the ability to exploit known or predictable statistical properties of traffic-based resource utilization to advantageously detect deadlocked resource conditions in a pool of shared resources.
Yet another important technical advantage of the present invention is that it is both logically simpler and more generally applicable to a wide variety of network systems and network applications than existing approaches.
Yet still another important technical advantage of the present invention is that it can be easily implemented within existing network systems in a cost effective manner.
The above-described and other features of the present invention are explained in detail hereinafter with reference to the illustrative examples shown in the accompanying drawings. Those skilled in the art will appreciate that the described embodiments are provided for purposes of illustration and understanding and that numerous equivalent embodiments are contemplated herein.