The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer aided design (CAD), computer aided software engineering (CASE), geographic information systems (GIS), and office information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant performance opportunities in the area of distributed transaction processing.
Today, the major distributed database management computer systems are "client-server", "shared nothing" and "shared disks" architectures. Most of these architectures use logging for recovery. In a client-server system, both the database and the log are stored with the server and all log records generated by the clients are sent to the server. In a shared nothing system, the database is partitioned among several nodes and each node has its own log file. Each database partition is accessed only by the owning node and a distributed commit protocol is required for committing transactions that access multiple partitions. In a shared disks system, the database is shared among the different nodes. Some shared disks systems use only one log file and require system wide synchronization for appending log records to the log. An example of this known type of system is disclosed in T. Rengarajan et al., High Availability Mechanisms of VAX DBMS Software, Digital Technical Journal 8, pages 88-98, February 1989. Some other shared disks systems use a log file per node. An example of this known type of system is disclosed in D. Lomet, Recovery for Shared Disk Systems Using Multiple Redo Logs, Technical Report CLR 90/4, Digital Equipment Corp., Cambridge Research Lab, Cambridge, Mass., October 1990. However, these systems either force pages to disks when these pages are exchanged between two nodes or they merge the log files during a node crash.
FIG. 1 is a flowchart illustrating the steps performed by most known systems when a database page "P" is updated by an application running on a node "N." These steps are performed in most known client-server database systems that implement logging, as well as in any other known distributed database management computer systems with multiple nodes N.
In step 50 of FIG. 1, the database page P is updated by node N and stored in N's cache. In step 52, a log record of the update is generated by node N. In step 54, node N determines if page P is managed by node N. If it is, then in step 56, node N writes the log record to a local log disk. However, if at step 54 node N determines that page P is managed by another node, then at step 58 node N sends the log record to the node or server that manages page P.
As shown in the FIG. 1, in most known distributed database management computer systems, log records are always stored local to the node that is managing the database page that created the log record.
Further, in existing client-server database systems, transaction management is carried out exclusively by the server. The main argument for not allowing clients to offer transactional facilities is twofold. First, client machines may not be powerful enough to handle such tasks; the high cost of main memory and disks in the past made it more cost effective to increase the resources of the server rather than the resources of each client. The second, and more important argument, is data availability and client reliability--client machines could be connected to or disconnected from the network or simply turned off at arbitrary times.
Today, advances in hardware and software have resulted in both reliable network connections and reliable workstations that approach server machines regarding of resources. Thus, client reliability concerns become less and less important. Concerns related to availability are more a function of the computing environment rather than of the technology. In many computing environments, such as corporate, engineering, and software development, client workstations are connected to the server(s) all the time. Of course, disconnection of these machines from the network for some reason does happen but it is a rare event (say, once a month) and can be handled in an orderly fashion. In such environments, additional performance and scalability gains are realized when clients offer transactional facilities, because dependencies on server resources are reduced considerably.
The following sections (sections A-C) discuss known client-server, shared disks, and distributed file systems and shortfalls present in each system.