In the current business environment, companies often have a need to maintain multiple copies of their corporate data in different databases in a distributed network. For instance, a company might want to keep a backup copy of its data for a disaster-recovery scenario, or the company might need to build a data warehouse from online databases, or it might want different physical locations to be able to access the same data without incurring the costs of long-distance network connections.
FIG. 1A is a simple block diagram illustrating one model, a master/replica network system 10, of such a business environment. As is shown, a master database 20 is maintained at a central location 30, such as the company's corporate headquarters, and copies or replicas of the database 25a–25c are accessible at one or more remote sites 40a–40c. Typically, the remote sites 40a–40c communicate with the central location 30 via any variety of communication links, such as LAN, WAN, or the Internet.
One way to maintain copies of the database is to download the master database 20 onto a tape or disk, send the tape or disk to the remote sites 40a–40c, and upload the data at the remote sites 40a–40c. For some applications, this method works well. For other applications, however, where frequent updates of the data is required, this method is impractical. In those cases, most conventional database management systems (DBMS) offer automated copying. Automated copying of a database is generally called replicating the data, and the task of copying the data is referred to as replication.
The replicas of the database 25a–25c can be “read only,” that is, a user accessing the replica from a remote site, e.g., 40a, cannot manipulate the data locally. Updates to the database are performed on the master 20 only. A capture program 45 in the central location 30 reads the updates to a log 47 associated with the master copy 20 and writes them to a staging table 49. In an asynchronous replication system, an apply program 50 will extract the updates from the staging table 49 periodically, and propagate the updates to the remote sites 40a–40c, thereby updating the replicas 25a–25c therein.
The user can also be allowed to update the replica locally at a remote site, e.g., 40a. Here, a capture program 45a in the remote site 40a writes the update to a staging table 49a associated with the replica 25a. When the apply program 50 runs, it extracts, from each of the staging tables 49, 49a–49c, the updates to the database, and propagates the updates to the remote sites 40a–40c. If updates to the replicas 25a–25c as well as the master 20 are performed, the apply program 50 must be able to detect and resolve potential conflicts in the replicated data. For example, a conflict in the replicated data will arise if two applications, one at a remote site, e.g., 40a and one at the central location 30, attempt to update a value for a certain part number at essentially the same time. In this situation, the apply program 50 will detect a conflict and record an error message (not shown) for the user's review, and resolve the conflict by propagating the value recorded in the master database 20.
In a peer-to-peer network system 10′, illustrated in FIG. 1B, one copy of the database is not designated as the master, but rather, all database copies 25a′–25c′ in the network are treated as equals. In such an environment, remote sites 40a–40c are replaced with local systems or members 40a′–40c′ where the user can access and manipulate data in the respective database locally. Like the master/replica network system 10, each of the members 40a′–40c′ includes a capture program 45a′–45c′ that reads updates to the local database 25a′–25c′ from a log 47a′–47c′ associated with each database 25a′–25c′ and writes and writes to staging tables 49a′–49c′. An apply program 50′ extracts the updates from the staging tables 49a′–49c′ during an asynchronous replication cycle and propagates the updates throughout the members 40a′–40c′ of the network. While the apply program 50′ is shown as a separate stand alone module, those skilled in the art will appreciate that the apply program 50′ can exist in one or all of the local systems 40a′–40c′, depending on the design implementation of the replication network.
The peer-to-peer network system 10′ provides several advantages over the master/replica network system illustrated in FIG. 1A. First, because it is not hierarchical, data flow between peers or members 40a′–40c′ is simplified. In addition, the non-hierarchical system is readily scalable, i.e., a new member can be added easily simply by connecting it to the network. Moreover, because all members are equal, the failure of one member will not catastrophically affect the other members of the network. In such a situation, the other members will continue to communicate normally with one another, while the failed member recovers.
While providing such desirable features, the peer-to-peer network system also presents some difficulties. One such difficulty is in detecting and resolving conflicts in replicated data. First, detecting conflicts becomes difficult as the number of peers increases. For instance, if one member wishes to update a record in a replicated table, the member has no way of determining whether the existing record is the most current value because another member may have updated that value earlier. Moreover, resolving such conflicts is difficult because no one copy of the database is designated the de facto copy.
Presently, if conflict detection in a peer-to-peer network system 10′ is conducted, it is performed during the replication cycle, that is, as the data is being copied from one member to another. In an environment with many thousands or millions of rows being replicated, but with few actual conflicts, such conflict detection is very costly because each row must be checked. Moreover, conflict resolution is generally performed manually by an administrator who examines an error flag when a conflict is detected, or in the alternative, by a rules-based application. Both methods are costly and cumbersome.
Accordingly, a need exists for a method and system that can detect and resolve conflicts in replicated data in a peer-to-peer database replication system. The method and system should detect and resolve conflicts in a cost effective manner, while improving the performance of the replication system. The present invention addresses such a need.