A database typically contains one or more tables, each organized in rows (also called records) and columns (also called fields). Each cell of the table contains data values. Cells in the same column typically share a common data type, e.g., integers, floating point numbers, alphanumeric strings, and bit patterns representing images or other data types. Data in columns can be compared for equality, but do not necessarily have to be compared for sort order.
In contrast to a centralized database where database information is stored and modified at a single server, the data in a distributed database is stored and modified at multiple sites (or nodes). For example, FIG. 1 illustrates a database distributed among nodes 100, 102, 104, each containing a local copy of Table 1 of a database and each receiving local commands 106, 108, 110, respectively, to change the data in the local copies of the database. The local changes may be, for example, commands to insert (add), update (change), or delete one or more rows (or records).
Each row of a database table typically has column containing a primary key which contains a value that uniquely identifies each row. In a distributed database, this key is the same for corresponding copies of the same row stored on other nodes. For example, this key might contain a row identifier value assigned sequentially by the local node within a unique range that is assigned to each node, or might be the composite of a row identifier value assigned sequentially by the local node together with a unique node identifier value.
Replication typically involves periodically (e.g., at scheduled replication intervals) collecting the changes made at each node, transporting descriptions of the changes to other nodes (e.g., via network communication links 112 and 114), and applying the changes at each of nodes 100, 102, 104, resulting in changed records 116, 118, 120, respectively. If all changes have been propagated to all nodes and resolved consistently and uniformly, then the replication has resulted in synchronization of the database copies at all the nodes.
Collisions occur in replication when multiple changes are made between replication intervals on different nodes to data in the same cells. Depending on the details of how the replication is performed, collisions may result in one of the changes persisting, the replication stopping for manual intervention, or may result in an undesirable result that the nodes never converge to the same copy.
Existing replication techniques provide several alternatives for resolving collisions. One collision resolution process prescribes the behavior of pairs of nodes. Each node participating in the replication collapses all changes made to a column to one overall change during the replication interval. During the replication process, both nodes compare changes. If a collision is detected, it is resolved by selecting a winning value for a column, and an additional record documenting the collision resolution is generated for transmission to other nodes.
Although this method is popular and used commercially, it does not handle closed replication topologies such as rings of nodes or topologies where nodes might replicate with multiple partners in a mesh or grid arrangement, or might replicate opportunistically to different partners available at different times. Also, conflict resolution involves generating additional change records which must be propagated to all nodes. In some topologies, each collision can generate large bursts of collision resolution records. Some commercial replication implementations try to avoid this problem by having special collision damping code which tries to suppress the excess records.
Additional problems in replication arise from constraints placed on multiple rows. For example, a uniqueness constraint may require that each cell in a particular column contains a unique value. Another example is when an operation on a set of values in a column is constrained by value; for example, the constraint that the sum of the values contained in a column must be greater than zero. While it is easy to enforce constraints when all changes are made on one node, it becomes possible to violate a constraint when changes are made at different nodes and a violation is detected only when the changes propagate by replication to a common node.
U.S. Pat. No. 5,924,094 discloses a distributed database system in which each site (server) transmits its changes to all other servers using local time-stamps to order changes. Local time stamps suffer from the problem that different servers have different times and the time can be changed such that a later modification might receive an earlier time stamp than the previous modification. When a server has a time far in the future, the record becomes poisoned and can never be changed. This time-synchronization algorithm can not be guaranteed to work. It does not provide any way of resolving collisions that occur because changes to same record are presented to different servers between replication times. There is a fixed replication arrangement pre-determined by central intelligence, does not allow communication among servers connected in rings or loops, requires bi-directional communication between replicating pairs, and needs a root for each activity.
U.S. Pat. Nos. 5,937,414 and 5,737,601 disclose procedural replication techniques in which all nodes are known in advance. Procedural replication involves migrating the commands that perform an insert, edit, or delete to all known nodes. Procedural replication does not record the state of the columns before the change.