The term “database” refers to a collection of data that is multidimensional in the sense that internal links between its entries make the information accessible from a variety of perspectives. This is in contrast to a traditional file system, sometimes called a “flat file,” which is a one-dimensional storage system, meaning that it presents its information from a single point of view. Whereas a flat file containing information about composers and their compositions might provide a list of compositions arranged by composer, a database might present all the works by a single composer, all the composers who wrote a particular type of music, and perhaps the composers who wrote variations of another composer's work.
Abstraction can be used to hide internal complexities. Database management systems (DBMS) hide the complexities of a database's internal structure, allowing the user of the database to imagine that the information stored in the database is arranged in a more useful format. In particular, a DBMS contains routines that translate commands stated in terms of a conceptual view of the database into the actions required by the actual storage system. This conceptual view of the database is called a “database model.”
In the case of a relational database model, the conceptual view of the database is that of a collection of tables consisting of rows and columns. For example, information about a company's employees might be viewed as a table containing a row for each employee and columns labeled name, address, employee identification number, and so on. In turn, the DBMS would contain routines that allow the application software to select certain entries from a particular row of the table or perhaps to report the range of values found in the salary column—even though the information is not actually stored in rows and columns.
In a relational database model, data is portrayed as being stored in rectangular tables, called “relations,” which are similar to the format in which information is displayed by spreadsheet programs. A row in a relation is called a “tuple.” Tuples include information about a particular entity. Columns in a relation are referred to as “attributes” because each entry in a column describes some characteristic, or attribute, of the entity represented by the corresponding tuple.
Some DBMSs use integrity constraints to prevent invalid data entry into the base tables of the database. Integrity constraints can be defined to enforce the business rules associated with the information in a database. If any of the results of a data manipulation language (DML) statement execution violate an integrity constraint, then a DBMS may roll back the statement and return an error.
For example, an integrity constraint might be defined for the salary column of an “employees” table. This integrity constraint might enforce the rule that no row in this table can contain a numeric value greater than 10,000 in this column. If an INSERT or UPDATE statement attempts to violate this integrity constraint, then the DBMS may roll back the statement and return an information error message.
A unique key integrity constraint requires that every value in a column or set of columns (key) be unique—that is, no two rows of a table have duplicate values in a specified column or set of columns. The columns included in the definition of a unique key constraint are called the unique key. In this sense, “key” refers only to the column or set of columns used in the definition of the integrity constraint. If the unique key consists of more than one column, that group of columns is said to be a composite unique key. For example, a customer table might have a unique key constraint defined on a composite unique key including “area” and “phone” columns. Such a unique key constraint would allow the entrance of an area code and telephone number any number of times, but the combination of a given area code and given telephone number could not be duplicated in the table, preventing the unintentional duplication of a telephone number.
A problem may sometimes arise when replicating data from a source database to a target database. This problem may occur when an attempt is made to replicate update operations, which were conducted relative to the source database, relative to the target database. More specifically, this problem may occur when such an update operation affected multiple rows of a particular table. Transient duplicates may occur during an update of a table with a primary or unique key constraint. These duplicate key violations are transient in nature and are resolved by the time the update statement is completed. The transient duplicates are handled automatically during the execution of the statement on the source system. A row-level replication system captures each of the row changes individually. These row changes are applied to the target database. The execution on the target database may be unaware of the original grouping of these row changes as a single statement on the source system.
An example of a statement that may produce transient key violations is the update statement “update table T1 set k=k+1” where k is an integer column with a unique constraint. Assuming that the values in this column k are (1, 2, 3, 4), column k will contain values (2, 2, 3, 4) after the first row is updated. The double occurrence of “2” in column k is a transient key violation since, after the next row update, column k will contain values (2, 3, 3, 4). The third row update will cause the values in column k to be (2, 3, 4, 4), and the final row update will cause the values in column k to contain values (2, 3, 4, 5)—leaving no duplicate key violations in column k. Nevertheless, if the DBMS checks for constraint violations after the first row update, and does not defer constraint checking until commit time, then the DBMS may produce an error indicating that the first row update caused a unique key constraint violation.
One attempt at working around the foregoing problem involves modifying the source database schema itself to include directives that instruct the DBMS to delay the enforcement of constraints until commit time. At commit time, any transient duplicates will have been resolved, and, therefore, will not cause a violation when constraints are checked. However, some database schemas do not allow the enforcement of constraints to be delayed. If a user wanted to defer constraint enforcement under such circumstances, then for each table in the source database that had a unique key constraint, that the user could create a copy of that table lacking the constraints in the source database and then perform source-to-target replication on the table copies rather than the originals. In another approach, the user could remove the constraint on the table at the target database. These approaches are unduly complex, wasteful of time, and/or storage space. Furthermore, this approach sometimes cannot be taken if applications, which use the database, expect to refer to data structures having certain properties.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.