1. Field of Invention
The present invention relates generally to the field of relational databases. More specifically, the present invention is related to a shared memory device aiding in the implementation of robust 2-phase commit protocols.
2. Discussion of Prior Art
The Open Group's XA protocol has become a computer industry standard for performing 2-phase commit operations between transaction managers and resource managers. FIG. 1 illustrates a functional relationship between transaction manager (e.g., WebSphere®, WebLogic®, etc.) 102 and resource manager (e.g., DB2®, Oracle®, SQL Server, etc.) 104 on the Unix® and Microsoft Windows® platforms. Resource manager (RM) 104 is responsible for managing a part of a computer's shared resources (i.e., software entities can request access to a resource from time to time, using services that the RM provides), while transaction manager 102 is responsible for managing global transactions, coordinating the decision to commit them or roll them back, and coordinating failure recovery.
Transaction manager 102 and resource manager 104 use a 2-phase commit with presumed rollback. In a first phase, transaction manager 102 asks resource manager 104 to prepare to commit transaction branches (i.e., resource manager 104 is queried to see if it can guarantee the ability to commit a transaction branch). If resource manager 104 is able to commit, it records any pertinent information it needs to do so, then replies affirmatively. A negative reply indicates failure of a transaction. After making a negative reply and rolling back its work, resource manager 104 can discard any knowledge it has of the transaction branch.
In a second phase, transaction manager 102 issues resource manager 104 an actual request to commit or roll back the transaction branch. Prior to issuing requests to commit, transaction manager 102 records decisions to commit, as well as a list of all involved resource managers (in this case, resource manager 104). Resource manager 104 either commits or rolls back changes to resources and then returns status to the transaction manager 102. Transaction manager 102 can then delete entries related to the global transaction.
Although XA is an industry standard, it is not nearly as robust as some of the proprietary 2-phase commit protocols that have been developed on other platforms such as OS/390 (e.g., IBM's systems network architecture (SNA) 2-phase commit used by Information Management Service (IMS) and IBM's customer information control system (CICS), resource recovery services (RRS) 2-phase commit used by WebSphere and DB2, and distribution relational database architecture (DRDA) 2-phase commit used by the DB2 family of products).
Provided below, and as depicted in FIG. 2, the following examples illustrate some of the scenarios where the XA protocol is less robust than some of the proprietary 2-phase commit protocols:                XA requires that transaction manager 102 drive the XA RECOVER algorithm 206 to resolve indoubt units of work. Transaction manager 102 calls XA_RECOVER() algorithm during recovery to obtain a list of transaction branches that are currently in a prepared or heuristically completed state. It should be noted that there is no provision for resource manager 104 to initiate the resolution of an indoubt unit of work.        XA RECOVER 206 requires that resource manager 104 provide a full list of indoubt transactions, but it has no provision where the members of a database server cluster can resolve indoubt units of work individually with the transaction manager 102. FIG. 3 specifically illustrates this scenario, wherein a full list of indoubt transactions 302 are passed on to XA RECOVER algorithm 303, but a member 304 of a database server cluster is unable to resolve individual indoubt units of work 306, 308, and 310 with transaction manager 102.        Indoubt units of work are typically resolved in XA during transaction manager 102 restart, as there's very little support for automatically resolving an indoubt unit of work that occurs due to a communication failure on a single network connection (i.e., only 1 of the “n” communication connections failed), without restarting the transaction manager 102.        
Furthermore, database systems are increasingly using hardware clustering technology to improve the overall availability of database servers. When database systems exploit clustering, they strive to provide a single-system image for the cluster of server machines, so that the clients (such as an XA transaction manager) are unaware that multiple physical machines are being used to run the database product. This creates a dilemma for satisfying the above-mentioned XA RECOVER requirement (i.e., that any member of the database cluster must be able to provide a full list of indoubt transactions upon demand, and this list must include indoubt transactions for all members of the database server cluster) while still allowing the XA RECOVER to occur when one or more members of the database server cluster are not available. Listed below are a few techniques currently available to address this requirement, but it should be stressed that each of these techniques have their own limitations:                a) client-side logging—this approach has the database client middleware write special log records on the client that record the list of indoubt transactions, wherein the database client middleware is able to consult the log to obtain a full list of indoubt transactions, without relying upon the availability of any of the database server members (it should be noted that one disadvantage of this approach is that the database client log becomes an object that must be handled for application server failover planning, backup, recovery, etc. and, moreover, this approach introduces a lot of additional administrative overhead for the customer).        b) server-side indoubt table—with this approach, the database client middleware performs INSERTs and DELETEs with a special table at the database server to keep track of indoubt transactions across the members of the database cluster (it should be noted that although this approach solves the above-mentioned administration overhead issues, it has negative performance implications in that additional INSERT and DELETE operations have to be performed to the relational table, with such operations introducing additional logging, etc.).        c) database cluster support for XA RECOVER—the database engine can provide support for the merged list of indoubt transactions through various means:                    a single log stream that contains the log records from all the members of the database cluster;            a special table containing the indoubt units of work at any given point in time; and            special XA RECOVER logic that merges the logs produced by all the members of the database cluster to produce a unified list of indoubt transactions for the cluster.                        
Each of the above-mentioned techniques would have negative performance or scalability implications for the database cluster. Hence, there exists a need to resolve the XA RECOVER requirement in a cluster of database servers:                without requiring all the members of the database cluster to be active during XA RECOVER—without introducing significant added CPU or elapsed time cost for processing the database transactions; and        without limiting the scalability of the database cluster.        
The following references provide for a general teaching in the area of distributed computing and database configuration.
The U.S. patent to Slaughter et al. (U.S. Pat. No. 6,014,669), assigned to Sun Microsystems, provides for a highly-available distributed cluster configuration database. The cluster configuration database is a distributed configuration database wherein a consistent copy of the configuration database is maintained on each active node of the cluster. Each node in the cluster maintains its own copy of the configuration database and configuration database operations can be performed from any node. Configuration database updates are automatically propagated to each node in a lock-step manner. If any node experiences a failure, the configuration database uses a reconfiguration protocol to insure consistent data in each node of the cluster.
The U.S. patent to Badovinatz et al. (U.S. Pat. No. 5,805,786), assigned to International Business Machines, provides for the recovery of a name server managing membership of a domain of processors in a distributed computer environment which includes detecting the failure of the name server node and consulting a membership list of nodes in the domain to determine the crown prince (CP) node who is next in line to become the name server. The other available nodes in the domain periodically send recover messages to the CP node, and responsive to receiving the recover messages from all the other available nodes in the domain, the CP node perform a two phase takeover whereby the CP node becomes the name server for managing said processors in the domain. After the CP node becomes the name server, the other available nodes in the domain send data to the new name server necessary for the name server to manage the other available nodes in the domain. All request messages requesting management by the name server are stored locally until after the CP becomes the name server. The locally stored request messages are then processed by the other available nodes such that no request messages are lost during recovery. U.S. Pat. Nos. 5,896,503 and 5,790,788, also assigned to International Business Machines, provide for similar teachings.
The patent to Attanasio et al. (U.S. Pat. No. 5,668,943), assigned to International Business Machines, provides for a system and method for recovering from failures in the disk access path of a clustered computing system. Each node of the clustered computing system is provided with proxy software for handling physical disk access requests from applications executing on the node and for directing the disk access requests to an appropriate server to which the disk is physically attached. The proxy software on each node maintains state information for all pending requests originating from that node. In response to detection of a failure along the disk access path, the proxy software on all of the nodes directs all further requests for disk access to a secondary node physically attached to the same disk.
The patent publication to Jacobs et al. (No. 2003/0018732) discloses a method for replicating data over a network using a one or two phase method. For the one phase method, a master server containing an original copy of the data sends a version number for the current state of the data to each slave on the network so that each slave can request a delta from the master. The delta that is requested contains the data necessary to update the slave to the appropriate version of the data. For the two phase method, the master server sends a packet of information to each slave. The packet of information can be committed by the slaves if each slave is able to process the commit. Patent publication No. 2003/0023898, also by Jacobs et al., provides for a similar teaching.
The Japanese patent to Brockmeyer et al., assigned to International Business Machines, discloses an expansion function of the two-phase commit protocol which attains the subscription of distributed subscribers between physically separated agents without relying upon the communication mechanism used in data processing systems.
The non-patent literature to Svobodova entitled, “File Servers for Network-Based Distributed Systems,” discloses a file server that provides remote centralized storage with options for performing an atomic update of data stored in the file server.
The non-patent literature to Mohan et al., entitled “Method for Distributed Transaction Commit and Recovery Using Byzantine Agreement Within Clusters of Processors,” replaces the second phase of one of the commit algorithms with a Byzantine agreement, allowing for certain trade-offs and advantages at the time of commit (thereby providing speed advantages at the time of recovery from failure).
The non-patent literature to Wang et al. entitled, “A Mobile Agent Based Protocol for Distributed Database Access,” provides for a three-tier protocol to improve data transmission while accessing distributed databases.
The non-patent literature to Hsial entitled, “DLFM: A Transactional Resource Manager,” provides for a two-phase commit protocol and a scheme for enabling rolling back a transaction update after a commit to the local database.
Chapter 14 of the book entitled “Advanced Database Systems” provides a review of parallel recovery in replicated databases.
Whatever the precise merits, features, and advantages of the above cited references, none of them achieve or fulfills the purposes of the present invention.