The present invention relates to the field of data replication. "Bidirectional Database Replication" is specified as the application of database deltas (i.e., the results of transactions being performed against a database) from either of two databases in a pair to the other one. Transaction I/O (e.g., inserts, updates, and deletes) applied to one database are applied to the other database and vice versa. Both databases are "live" and are receiving transactions from applications and/or end users. "Bidirectional Database Replication" presents itself in two ways--"bidirectional homogeneous database replication" and "bidirectional heterogeneous replication." In homogeneous database replication, the two databases in the pair are identical. In heterogeneous replication, while there must necessarily be some commonality between the two databases in the data they contain, there are differences in the databases. For example, the databases may be of the same commercial brand but have differing physical and logical structures, or the databases may be essentially identical but be of different commercial brands.
The typical purpose of "Bidirectional Homogeneous Database Replication" is to provide a higher degree of fault tolerance than can be provided through conventional unidirectional database replication from a production database to a backup database ("hot backup replication"). In traditional hot backup replication, database deltas are captured from the production database and are applied to a backup database that is usually located remotely. In the event of a failure of the primary system, the applications on the backup system must be started and the end users must be routed to the backup database. In the best implementations, this process takes several minutes. Furthermore, assuming the failure of the primary system was not caused by a severe catastrophe, the computer operators must be moved from the production location to the backup location. For example, in severe natural disasters or terrorist incidents, it may not be possible to transport the computer operators to the backup site. In typical "Bidirectional Homogeneous Database Replication," both databases (and both sites if they are remote from one another) are live at all times. All applications are running on both databases, end users are already enabled to perform transactions on either database, and each data center is fully staffed. In the event of a failure of one of the two databases, all processing is performed on the remaining live database. No downtime of the application is endured.
The typical purpose of "Bidirectional Heterogeneous Database Replication" is to allow heterogeneous computer systems and databases to share data in one enterprise with minimal impact on the production processing occurring on either system. Traditional Open Database Connectivity solutions for this problem allow applications or end users on one system to run queries against the databases on the other systems. This causes massive read activity on the other system that hampers production transaction processing. "Bidirectional Heterogeneous Database Replication," however, uses very little system resources on either system. Changes to data on one system which are needed on the other system are replicated as they occur allowing the applications and end users on each system access to the local data in a near real-time manner. Many financial institutions perform much of their core transaction processing and all of their settlement activity on legacy mainframe systems. Most of the new applications that these institutions would like to use, however, are written for newer computer platforms and modem SQL databases. By using "Bidirectional Heterogeneous Database Replication," these institutions can utilize both their legacy systems and modem systems, databases, and applications and propagate data changes from either system/database to the other.
While the business applications of "Bidirectional Homogeneous Database Replication" and "Bidirectional Heterogeneous Database Replication" are quite different, the technology that enables each is the same. The technical difference between unidirectional database replication and "Bidirectional Database Replication" is that "Bidirectional Database Replication" can recognize the origin of a given transaction and apply the transaction from one database to the other without then necessarily applying it back to the database from which it originated.
FIG. 1 shows a diagram of a low-latency unidirectional data replication system 10 used in one commercially available product called Shadowbase.TM., available from ITI, Inc., Paoli, Pa. Shadowbase runs on Tandem's Non-Stop Kernel D20 or higher operating system. Shadowbase captures changes made by applications (application programs 12) to audited database files or tables, known as the source database 14, and applies those changes to another version(s) of that database, known as the target database(s) 16 (hereafter, "target database 16'). The source database(s) 14 can be in Enscribe or SQL or a combination of both formats. The target database 16 can be in the same format as the source database 14 or another format, such as Oracle, Microsoft SQL Server or Sybase, or the system 10 can have multiple target databases 16 in a combination of formats. The target database 16 may be elsewhere on the same system 10, or on another Tandem node, or in a UNIX or NT environment, or a combination of the above. The target database 16 does not have to be an exact replica of the source database 14. Target field/columns can be in a different order within a file/table; fields/columns can appear in the target that don't exist in the source, fields/columns that appear in the source do not have to appear in the target. Source rows can be filtered off and thus may not appear in the target database 16 at all. Also, source rows can be aggregated, meaning multiple source rows are collapsed into one or more target rows. In addition, with the inclusion of specially written custom code (via User Exits), a target database 16 may be created which is very different from the source database 14.
As applications 12 make modifications (e.g., inserts, updates and deletes) to the data in the audited source database 14, TMF (transaction monitoring facility) or TM/MP (transaction monitoring/massively parallel) records the details of the transactions in audit trail files 18. A Shadowbase object or process, known as a "collector" (collector 20) reads the audit trails in the audit trail files 18 and collects changes made to the source database 14. These changes are then sent via interprocess message to another Shadowbase object or process, known as a "consumer" (consumer 22). The consumer 22 applies the changes to the target database 16 on the Tandem NonStop Kernel system or formats messages that are then sent to the Shadowbase open server running in a UNIX or NT environment. Any custom code for User Exits becomes part of the consumer 22. The functioning of the respective collector and consumer objects 20 and 22 are constantly monitored by a third Shadowbase object known as AUDMON 24. All of the Shadowbase objects send informational messages to Tandem's Event Management System (EMS) to facilitate operating the system. To allow users to configure, control and gain additional object status information, Shadowbase comes with a command interface, AUDCOM 26. The audit trail is a log of the changes which have been made to the source database 14. The audit trail contains an Event Sequence list of the changes that have been made to the source database 14. The list of changes may comprise individual operations performed on the source database (e.g., insert a row, update a row) or operations performed on all or a set of records in the source database 14 (e.g., increase the price column by 10%). Any operations logged to the database must be reversible. If the operations are not reversible, the main use of the audit trail, which is to back out aborted operations, would fail. Events in the audit trail are usually tagged with a transaction identifier and multiple transactions may overlap.
The system 10 also includes a restart file 27 connected to the collector 20. The function of the restart file 27 is described below in the "Definitions" section.
Bidirectional replication simplifies the manual procedures necessary to manage outages of a system during planned and unplanned switchovers. These procedures are currently required due to a replication side effect, which is referred to herein as "ping-pong." That is, if you attempted to configure a "bidirectional" and live stand-by environment with two unidirectional schemes, transaction audit events would oscillate indefinitely between the two systems. This is because events applied on one system would be captured and would continually be bounced back and forth, thereby resulting in a "ping-pong" effect. Conventional unidirectional replication requires manual procedures to turn on the flow or start a stand-by copy following a fail-over (once the original primary system has come back on line as the backup system), due to operational complexities in managing this environment. Resolving the bidirectional "ping-pong" issue would then provide the capability for a "Sizzling Hot Stand-By" environment, particularly if the latency time is low. ("Latency" is defined as the time that the commit takes place on one system to be applied on the peer or other system.) Once a solution is provided to remove the "ping-pong," two-way flow is then possible. Note that it is not necessary to provide a means to detect or eliminate collisions, in the event that replication is enabled in both directions simultaneously. Collision avoidance is primarily an application-related issue that requires some method of version identification in the database record and is complimentary to bidirectional replication.
Conventional data replication systems typically consist of two or more computers that use the same program(s) or type(s) of program(s) to communicate and share data. Each computer, or peer, is considered equal in terms of responsibilities and each acts as a server to clients in the network.
Conventional peer-to-peer data replication methodologies are bidirectional and also often have the freedom that the updates to particular data rows can occur on either system and thus involve a conflict resolution mechanism to prevent ping-pong.
Conventional peer-to-peer, bidirectional data replication systems rely on one of the following schemes:
1. Database Partitioning
2. Master copy
3. Row versioning
4. Time resolving updates
5. Oracle Corporation patented schemes
6. Pass the Book
A general discussion of these schemes follows below.
1. Database Partitioning--This approach to application systems design is appropriate when the users and their activity can be naturally partitioned and directed to a single server in the network. Examples would be telephone credit card processing, cellular phone processing, etc. The work is usually partitioned by physical geography. Other partitioning schemes are by logon name or ID, client name, client group, etc. Updates for a customer never actually happen on more than one system at a time. Thus, collisions and ping-pong never occur and two unidirectional systems suffice.
Database partitioning is not a very practical solution, unless a user is willing to repartition their database to accommodate this approach. In general, most users are often unable to repartition their database. In fact, in some instances, there might be technical performance reasons for partitioning their database using different keys (e.g., load balancing, throughput, etc.) than the keys useful for bidirectional partitioning.
2. Master copy--If the database of one system is designate as the master copy, then all updates are funneled through this system in this scheme. The updates are then always flowing from the peer system containing the master database copy, whereas reads are often done locally.
The master copy approach can cause LAN utilization delays and has operational issues with designating the master copy, especially when the master has been offline for a period of time and a new master copy needs to be chosen. Also, requiring that the updates flow through one master copy means that some updates may wait on row or table locks and thus updates may have a high latency. One example of a master copy scheme is described in U.S. Pat. No. 5,781,910 (Gostanian et al.) assigned to Stratus Computer, Inc.
3. Row versioning--Some applications need to maintain version numbers in records and carry the version number from machine to machine. When the version number that an application updates is inconsistent with the version number that is stored in the database, corrective action is required. In such cases, a user exit on the target system receives a record from a source system, locks and reads the appropriate record on the target system and compares version numbers. If the version numbers are consistent, then the application updates unlocks the target system record. If the versions are not consistent, one of the following rules is selected:
1. The user exit can take corrective action according to a predefined business rule.
2. The user exit can notify a human for manual intervention.
3. The user exit can reject the change and log the event.
4. The user exit can accept the change and log the appropriate information so that another process or human can ensure that the proper action was taken. A major challenge of this scheme is the creation of efficient collision or conflict resolution stubs according to one or more of the above rules. Conflict resolution is an application-dictated piece of code. In some applications there is no need for conflict resolution because the application only inserts records with unique keys or simply assigns a value to a total without regard to the previous total. When the insert comes over to the target system, a user exit inserts the record and updates a total field as one unit of work. When the update to the total record comes over from the other machine, it is ignored because the transaction has already accounted for the increment or decrement in the total. Ping-pong is avoided when the particular conflict detected is that the row version numbers match and the change is then discarded.
In summary, row versioning requires modifications to the database, and is a fairly challenging operational issue for most users, and thus is difficult to implement and often has a high latency.
4. Time resolving updates--In this scheme, row updates with either the lowest or largest timestamp or sequence number wins, depending upon what is important to the user. This mechanism is useful if the future state of a row does not depend upon the current state of a record. For example, adding the number two to a total column may be a problem if it were rejected over a later update but not setting the total column to an absolute value. This technique is often used with synchronizing e-mail systems. Often, this technique will prompt the user for which row to take (e.g., the update from system A or B). For many reasons, this technique has severe operational issues with database consistency and restartability. In reality, this scheme is a variation on the previous (row version) scheme and has many of the same operational problems.
5. Oracle Corporation patented schemes--The scheme in U.S. Pat. No. 5,737,601 (Jain et al.) places information regarding database modifications in a set of replication tables. Ping-pong of row updates is prevented by the usage of global variables or special log tables in conjunction with database triggers. Oracle's replication scheme has many limitations. Also, various restart problems result from its design. Furthermore, the replication becomes tightly bound to the original application update which results in a higher latency, a slowing down of the application, or an application failure if the communication channel is unavailable.
6. Pass the Book--Simple unidirectional replication can be used in a bidirectional capability if it is only turned on in one direction at a time, e.g., on every odd hour from system A to B, and on every even hour from system B to A. In this scenario, the source system must always be the system on which updates are made. This approach is not low-latency and has numerous operational problems, including the inability to fully utilize both systems at all times.
Conventional peer-to-peer schemes, including those discussed above, have a high latency, are limited to the row update level, and/or have significant operational restrictions. Critical to all peer-to-peer, bidirectional data replication methodologies is the requirement to control updates on one system which are replicated to the other system from replicating back to the original system. This problem, presently called "ping-pong," results in an oscillation which would waste considerable computer resources and communication bandwidth and may result in data loss or corruption (a "dirty" database). Likewise, a useful replication scheme must not impact the primary system and must be restartable without data loss or corruption in the event that one or more of the replication peers are not available for a period of time. Accordingly, there is still an unmet need for a data replication scheme which avoids ping-pong, has a low latency, is restartable, and provides operational ease of use. In addition, in some instances, a limited amount of ping-pong may be useful. For example, one may need to see the "pong" reply to verify that all important transactions get applied. Accordingly, there is a need for a bidirectional data replication scheme which allows for selective ping-pong of transactions. The present invention fulfills all of the needs discussed above.