1. Field of Invention
This invention relates to synchronization of application programs which execute on host processors which are not tightly-coupled, and which process shared data.
2. General Discussion
In the discussion of the invention, the following terms will have the meanings set forth:
a. Centralized Data Base Management System--a Data Base Management System in which the processing logic executes on a single Host Processor. PA1 b. Closely-Coupled--is the relationship of Instruction Processors in which the Instruction Processors do not share the same addressable memory space, but share some Mass Storage, and the relationship of Host Processors in which the Host Processors share some Mass Storage between them. PA1 c. Distributed Data Base Management System--is Data Base Management System for controlling a collection of multiple, logically interrelated data bases distributed over Host Processors interconnected by a communications network. PA1 d. Host Processor--is a data processing system which has one or more Instruction Processors, executing an Operating System and intercoupled, for executing programmed instructions, and an addressable memory which is shared among the Instruction Processors. PA1 e. Instruction Processor--is that functional portion of a data processing system in which machine instructions for the data processing system are executed. PA1 f. Loosely-Coupled--is the relationship of Instruction Processors and associated addressable memory in which the Instruction Processors do not share the same addressable memory space or Mass Storage, but have some type of communication link between them. PA1 g. Mass Storage--is addressable memory which is used for secondary and long-term storage of data, formatted as files, blocks, and records, and is accessible to programs executing on a particular Host Processor. PA1 h. Shared Data Base Management System--is a Data Base Management System where the processing logic is distributed among several Host Processors while the data base is stored on Mass Storage which is shared by each of the Host Processors. PA1 i. Tightly-Coupled--is the relationship between Instruction Processors and associated addressable memory in which the Instruction Processors share the same addressable memory space.
In today's computing environments, application programs are sometimes distributed over one or more Host Processors to enhance performance. To the extent the distributed application programs share resources and data, they need to coordinate activities to avoid deadlock situations and data corruption. The application programs accomplish this coordination by passing pertinent information amongst themselves. One area in which application programs are typically distributed is Data base Management Systems (DBMS).
The usefulness of the present invention and how it can be used to enhance the performance of application programs, and DBMSs in particular, can be appreciated from the following brief overview of the various approaches in deploying Data Base Management Systems. Briefly, Data Base Management Systems are classified as 1) Centralized, 2) Distributed, and 3) Shared.
A Centralized DBMS can be defined as a DBMS in which the processing logic executes on a Host Processor. The data base is stored on Mass Storage which is dedicated to the Host Processor, and users interact with the DBMS through remote terminals which are coupled to the Host Processor through communications hardware and software.
While the Centralized DBMS architecture offers the advantage of suitability for transaction intense applications--such as airline reservation systems--disadvantages inherent in the design are largely a result of contention for memory and input/output resources by the various data base application programs. Because the Instruction Processors comprising the Host Processor share the same addressable memory space, expansion of the processing capacity to meet additional demand becomes costly in that entire Host Processors may have to be replaced. Furthermore, with all Instruction Processors sharing input/output channels and memory, the risk of a single point of failure that will disable the entire system is high.
The Distributed DBMS model seeks to alleviate some of the weaknesses present in the Centralized DBMS approach. A distributed data base is a collection of multiple, logically interrelated data bases distributed over Host Processors interconnected by a communications network. A Distributed DBMS is a software system that permits the management of the distributed data bases and makes the distribution transparent to the users, as described by Tamer M. Ozsu and Patrick Valduriez, Principles of Distributed Data base Systems, (New Jersey: Prentice Hall, 1991) p 4.
The advantages offered by the Distributed DBMS model cited by Ozsu include: 1) frequently used data is close to the user; 2) distributing the processing logic increases performance through parallel processing; 3) processing capacity can be increased in a modular fashion; 4) and distribution of data and processing decreases the likelihood of a single point of failure.
While the Distributed DBMS approach solves some of the problems presented by the Centralized DBMS approach, it brings with it new problems. First, the complexity of a Distributed DBMS is compounded because each processing component must synchronize and coordinate with other components to ensure that every change in a local copy of a data base is reflected in all other copies of the data base. Second, in a transaction intensive application, a Distributed DBMS approach may be slow because of the necessary coordination between the DBMSs on each of the Host Processors. Finally, many businesses have invested heavily in DBMSs utilizing a Centralized approach. To distribute a centralized application would prove costly and difficult as noted by Ozsu and Valduriez, Principles of Distributed Data base Systems at page 9.
A Shared DBMS has characteristics common to both the Centralized and Distributed approaches to DBMSs. The Shared DBMS model consists of a plurality of Closely-Coupled Host Processors, each of which has its own private Mass Storage. The data base processing logic is distributed among the Host Processors while the data base is stored on Mass Storage which is shared by each of the Host Processors.
The Shared DBMS approach in which the present invention is used is distinguishable from those approaches where a data base is shared among a plurality of Loosely-Coupled Host Processors and the DBMS communication is via an inter-Host Processor communications network. In other shared data base approaches, one Host Processor, the server, is responsible for providing access to the data base. The other Host Processors direct all data base queries to the server through the communications network.
The DBMS in which the present invention is used, in contrast to other shared data base approaches, is implemented in an environment in which the Host Processors are Closely-Coupled. The Closely-Coupled system has all Host Processors sharing a data base and directly coupled to a Multi-Host File Sharing system which is commercially available from Unisys Corporation. The Multi-Host File Sharing system controls Mass Storage and allows the coupled Host Processors to share the data available on the Mass Storage. Coordination of data base update activities is accomplished through a Record Lock Processor (RLP), which is commercially available from the Unisys Corporation and is described in the cross-referenced patent application, "Record Lock Processing for a Multiprocessing Data System", which is incorporated by reference.
The Shared DBMS approach in which the present invention is used offers the advantages of parallel processing power, limited risk for a single point of failure, incrementally expandable processing power, and compatibility with existing data base applications.
Having described the various models for DBMSs and the DBMS in which the present invention is used, two particular instances where the present invention may be used in a DBMS in which the respective Host Processors are Closely-Coupled will be described next.
In any DBMS where there are multiple applications seeking access to a common data base, one problem that must be addressed is that of concurrency control. Where there are multiple applications seeking to update a data base, access to the data base must be controlled in such a manner so as to ensure data base integrity. In the Shared DBMS approach, concurrency control is accomplished with the previously mentioned Record Lock Processor (RLP). The RLP is directly coupled to each Host Processor sharing a data base. The respective DBMSs can gain exclusive, update protected, or shared access to files, blocks, and records by using the RLP, thereby ensuring a consistent view of the shared data.
While the RLP solves one set of problems relating to concurrency control between DBMSs on Closely-Coupled Host Processors, the locking approach is not appropriate for another set of problems relating to concurrency control. In particular, certain situations encountered by one DBMS may be such that the other DBMSs should be notified of the situation. In these instances, a message is sent to the other DBMSs to call attention to the particular situation. Two specific examples where one DBMS needs to notify another DBMS are set forth below.
One situation where a fast message passing scheme is desirable is where the DBMS user determines that, for possible recovery needs, a file is to be made unavailable to normal DBMS access while the user is making a backup copy of the file to tape, reloading an older copy of the file from tape, or performing some other maintenance function on the file. This cannot be accomplished with the RLP locking mechanism because locks are associated with DBMS activities and the file must remain in an unavailable state even after the user making the request has terminated its DBMS access. Because there may be a person sitting at a terminal, waiting for a response from the DBMS, the message passing between DBMSs on the Closely-Coupled Host Processors must be done as quickly as possible.
The scenario in the foregoing situation would be as follows: A first DBMS on a Host Processor marks a file is unavailable so access to the file is denied to normal DBMS transactions. The first DBMS then sends a message to DBMSs on other Host Processors which are sharing the file. After receiving the message sent, each of the receiving DBMSs marks the file as unavailable, thereby denying access to the file by local applications, and each of the receiving DBMSs sends a response back to the first DBMS to acknowledge that the operation is complete. When the first DBMS has received a response message from each of the DBMSs to which the message was sent, it then notifies the user that the operation has completed, allowing the user to continue with whatever maintenance function is desired.
A second situation in which it is desirable that a fast message passing mechanism be used between DBMSs is when a dynamic dump is requested by a DBMS user. A dynamic dump saves audit information to tape regarding changes to a data base while allowing DBMS application programs continued access to the data base. The results of the dynamic dump can then be used at a later time should the need arise to reconstruct the data base. For a dynamic dump to be usable, the start time of the oldest DBMS application program currently processing the data base needs to be identified and stored with the dump information. The oldest start time of all the DBMS application programs on the Closely-Coupled Host Processors must be obtained. The precise reason for this identification is beyond the scope of the discussion for the present invention and will be omitted for the sake of clarity.
The scenario in the second case would be as follows: A first DBMS receives a request to take a dynamic dump. Before performing the dynamic dump, the first DBMS must determine the start time of the oldest DBMS application program. To determine this, the first DBMS sends a message to the DBMSs on each of the Closely-Coupled Host Processors. Each of the DBMSs receiving the message determines the oldest start time of the DBMS-application programs on their respective Host Processor, and sends a response message back to the first DBMS which includes the oldest start time obtained. When the first DBMS has received a response message from each of the DBMS to which the message was sent, it determines, using the information returned in the response messages, the oldest start time. The oldest start time is then recorded with the dump information. Because there may a user waiting at a terminal for the dump to complete, the message passing process must be completed as quickly as possible.
The preceding problems are typically addressed using some sort of message passing mechanism. In a data processing environment where the Instruction Processors are Tightly-Coupled, a message can be passed from one Instruction Processor to another via a shared memory segment. In a data processing environment where the Instruction Processors are Loosely-Coupled, messages can be passed via an inter-Host Processor communications network. With the architecture in which the present invention is used, the shared memory message passing scheme is not an option because the Instruction Processors on which the DBMSs execute are not Tightly-Coupled, and the communication network method to pass messages is not a viable option due to the speed requirements of a transaction intensive environment. Thus, it can be seen that a fast method of communication between DBMSs on Closely-Coupled Host Processors is required.