(Not Applicable)
(Not Applicable)
1. Technical Field
This invention concerns a method for locking of critical memory regions in a multiple workstation environment utilizing shared data.
2. Description of the Related Art
Satellite ground stations today are required to process and distribute more data from even more space vehicles (SVs) than ever before (or even imagined). In the past, a ground station would have a single SV in view at any given point in time or in the case of a station supporting geosynchronous systems, separate hardware systems dedicated to a single SV would perform the isolated processing to determine vehicle state. Newer satellite constellations containing up to 80 SVs are being supported by ground stations consisting of up to 50 workstations, which are constantly sharing data, and performing activities that must be serialized among all processes on all the workstations.
The multiple workstation environment described above results in a distributed system where the shared data is typically replicated within the context of each workstation""s memory. The distributed memory and processing creates critical regions that must be protected such that only one process can operate in a critical region at a time. These critical regions may be shared data or instructions to be executed. The problem in guaranteeing a process solitary access to a critical region in a distributed environment is called distributed locking. Significantly, the problem is not unique to the satellite command and control environment and can be encountered in any distributed system of workstations utilizing scarred data.
Distributed locking is the concept of guaranteeing serialized access to a critical region in a multiple processor environment where each processor is operating within the context of it""s own memory and cannot access the memory of another processor. For the purpose of understanding the present invention, it is helpful to understand some lock specific terminology as will be used herein. For example a xe2x80x9crequestxe2x80x9d is a solicitation to be granted a lock for access to a critical region. A xe2x80x9cgrantxe2x80x9d is an issuance of a lock in response to a lock request. A xe2x80x9cholderxe2x80x9d is an instance of a process or thread that has been granted a lock and has not yet released the lock. Finally, a xe2x80x9creleasexe2x80x9d is the act of freeing the lock so that other requests for that lock may be granted. A release is typically performed when access to the critical region is no longer required.
One of the problems associated with distributed locking concerns efficient lock granting. In particular, a difficult obstacle to overcome in designing a distributed locking system is minimizing the amount of work involved in granting a lock. Most solutions for distributed locking typically present an algorithm for lock granting that has a complexity of N2 where N is the number of workstations (nodes) in the system. This means that each time a lock is granted there are N2 inter-process messages or communications between the different nodes to agree on the granting of the lock. In distributed systems, where there is already a lot of network traffic, this solution is inefficient and prohibitive to real time processing. The cost of obtaining a lock grant can exceed time critical windows in which the lock must be granted. The reason for the high amount of communication is to ensure that each node in the system agrees on the state of the lock. There are other algorithms which require a lower number of interprocess messages or communications between different nodes in order to agree on granting the lock. For example, one such system has a complexity of N, where N is the number of nodes in the system. While these alternatives are less expensive in terms of messaging traffic requirements, there are tradeoffs associated with such system as concerns recovering from network or communication failures.
Deadlock is another problem associated with any distributed resource such as locks. Deadlock is a state where no forward progress can be made due to contention for a specific resource. In the case of distributed locks, deadlock is most common in when request is made for a lock that will not be released by a previous holder of the lock. The absence of a release may be due to several different reasons. For example (1) the holder process may have terminated abnormally, (2) the holder process may have deliberately not released the lock, or (3) the node on which the holder process is running may have lost connectivity on the network.
Another problem encountered in distributed locking systems relates to recovery from network failures. a failure in the network can cause a node or group of nodes to become isolated. The loss of network connectivity usually results in either locks that will never be granted until the network is connected again or an expensive process of re-establishing the lock resources when the failure is corrected.
If locks are not granted because of a network failure, the result can be deadlock, reduced performance, or the inability to perform time critical tasks that require the granting of locks. The process of re-establishing lock resources upon regaining connectivity to isolated nodes is very often expensive requiring that all nodes once again communicate with all other nodes to determine the state of the lock. This again is typically an N2 algorithm, where N is the number of nodes.
Yet another problem with distributed locking systems concerns static lock definitions. In particular, most solutions to the distributed lock problem involve the pre-runtime or static definition of all locks in the system. This means that prior to system startup all locks are usually defined and while the system is running no new locks may be created. This can be burdensome in circumstances where it is necessary add functionality to a system, particularly where system downtime must be minimized. For example, if the need for a new/undefined lock to perform a time critical task is needed, then with static lock definitions this could mean brining the distributed system down and restarting. The act of stopping and restarting the system can take an excessive amount of time that a real time system cannot afford.
The present invention concerns a method for dynamic distributed memory locking in a computer network. According to the method a local lock process executing on a node receives from an application program a lock request for exclusive access to a memory location. The lock process identifies a first lock process executing on any node of the network that is currently a designated lock manager for granting locks for the particular memory location for which the lock was requested. Subsequently, the local lock process requests from the designated lock manager a lock for the particular memory location. Finally, the local lock process notifies the requesting application program when a lock has been granted by the lock manager.
The method can further include the step of designating a local lock process executing on a selected node as the lock manager when no lock manager is currently designated for the particular memory location for which a lock has been requested. The application program and the local lock process can both be executing on the same selected node, but can also be executing on separate nodes.
According to another aspect of the invention, the method can include the step of determining whether an option to create a lock manager is enabled prior to designating the local lock process as the lock manager, and causing the lock request to fail if the option is not enabled. This feature permits a user to control the operation of the distributed locking system by limiting the complexity of a return to connectivity situation after nodes have been disconnected.
According to another aspect of the invention, in response to detecting a loss of connectivity with a node, a local node can determine whether a disconnected lock process executing on the disconnected node has been designated as the lock manager for any the particular memory location. In this situation the local lock process updates a lock manager file to remove the designation of the disconnected lock process as lock manager for the memory location.
Upon detecting a return of connectivity with any node which is a lock manager, the method further includes the steps of determining whether any conflicting lock manager designations exist with a local list of lock managers and resolving conflicting lock manager designations by a priority designation assigned to each network node. The priority designation is preferably a statically assigned enumerated value which permits a higher priority node to be reassigned responsibility as lock manager upon a return to connectivity.
According to another aspect of the invention, any lock which has been granted by a lock manager can be treated as terminated after a predetermined period of time. However, the system provides users with a further option so that after the predetermined period of time has expired, a lock process requesting a new lock will continue to wait for a release of the previously granted lock to occur before accessing the memory location which is the subject of the existing lock.
The invention can also include a computer network having a dynamic distributed memory locking system. Programming is provided for receiving from an application program a lock request for exclusive access to a memory location; for identifying a first lock process executing on any node of the network that is currently a designated lock manager for granting locks for the memory location; for requesting from the lock manager a lock for the memory location; and for notifying the application program when a lock has been granted.
Suitable programming is similarly provided for designating a local lock process executing on a selected node as the lock manager when no lock manager is currently designated for the memory location. According to one embodiment, the application program and the local lock process are both installed and execute on the selected node. However, the application program and the local lock process can also be executing on separate nodes, and the invention is not intended to be limited in this regard.
According to another aspect of the invention, programming is provided for determining whether an option to create a lock manager is enabled prior to designating the local lock process as the lock manager. In that case, programming is also preferably provided for causing the lock request to fail if the option is not enabled and no lock manager is currently designated.
According to yet another aspect of the invention, programming is provided which is responsive to detecting a loss of connectivity with a node. Such programming is configured for determining whether a disconnected lock process executing on the disconnected node has been designated as the lock manager for the requested memory location. If so, the system updates a lock manager file on the local node to remove the designation of the disconnected lock process as lock manager for the memory location.
Finally, the computer network system preferably includes programming responsive to detecting a return of connectivity with any node which is a lock manager. Such programming determines whether any conflicting lock manager designations exist by comparing the connecting node""s lock manager file with a local list of lock managers. In addition such programming preferably resolves conflicting lock manager designations by a priority designation of each network node. The priority designation is preferably a statically assigned enumerated value. However, it will be appreciated that the invention is not limited in this regard and other priority designation schemes may also be used.