1. Field of the Invention
The present invention relates to the operation of distributed processing computer systems. In particular it relates to those systems that have a plurality of processing nodes each one having access to a number of shared resources and which require apparatus and methods for managing access to the shared resources. Still more particularly, the present invention relates to the management of a shared control file that designates one of a number of distributed processes as the master process for controlling access to that shared resource.
2. Background and Related Art
Distributed computer systems are created by linking a number of computer systems using a communications network. Distributed systems frequently have the ability to share data resident on a individual systems. Replicated data systems implement data sharing by providing a replica copy of a data object to each process using that data object. Replication reduces the access time for each processor by eliminating the need to send messages over the network to retrieve and supply the necessary data. A replicated object is a logical unit of data existing in one of the computer systems but physically replicated to multiple distributed computer systems. Replicated copies are typically maintained in the memories of the distributed systems.
Replicated data objects also speed the update process by allowing immediate local update of a data object. Replication introduces a control problem, however, because many copies of the data object exist. The distributed system must have some means for controlling data update to ensure that all copies of the data remain consistent.
Prior art systems control data consistency by establishing a master data object copy in one of the distributed systems. The master copy is always assumed to be valid. Data object update by a system other than that of the master copy requires sending the update request to the master for update and propagation to all replicas. This approach has the disadvantage of slowing local response time as the master data object update and propagation are performed.
Another means for controlling replicated data is described in Moving Write Lock for Replicated Objects, commonly assigned, filed on Oct. 16, 1992 as Ser. No. 07/961,757, now U.S. Pat. No. 5,418,966. The apparatus and method of that invention require that a single "write lock" exist in a distributed system and be passed to each process on request. Data object updates can only be performed by the holder of the "write lock." The "write lock" holder may update the local object copy and then send that update to the master processor for its update and propagation to other processes. The above patent application is incorporated by reference.
The method for determining which of a number of distributed processes is to be master is described in commonly assigned patent application Ser. No. 07/961,750 filed Oct. 16, 1992 and entitled Determining a Winner of a Race in a Data Processing System, now U.S. Pat. No. 5,469,575. The "race" between each process potentially controlling a resource results in the assignment of master status to the process first establishing write control over a Share Control File. Once control has been established by one process, other processes are assigned "shadow" status. Master process failure causes reevaluation of master status. This patent application is also incorporated by reference.
The technical problem addressed by the present invention is providing fault-tolerant features to a distributed processing system using write lock management of replicated data objects. Fault tolerance is required to ensure that no data or updates are lost due to the failure of a master process. Prior art systems, including those referenced above, require the master determination and write lock control to be reinitialized. This could result in loss of data if a locally updated data object replica has not been propagated to the master or other replicas.