A database is any collection of information organized for rapid search and retrieval. A database stored in a computer-readable medium is commonly modeled as a collection of one or more tables. Each table, in turn, is modeled as a collection of one or more records (also commonly referred to as a “row”), and each record as a collection of one or more fields (also commonly referred to as a “column”). In a conventional table, all records comprise the same number and type of fields, and in the same order. A relational database consists of tables that are “related” to each other through common fields. The most common way to establish a relationship between two tables is to include one or more fields in each table that hold “key” information. A “primary key” field uniquely identifies a record, and commonly is just a number unrelated to other data in the record. A “foreign key” field is an identifier in a record that establishes a relationship with a primary key in another table. For example, employee records might have an “employee” table containing a “department_id” field that references data located in a “department” table's “dept_id” field. In this example, the dept_id field uniquely identifies each department, while the department_id field identifies the department in which an employee works. Thus, in this example, dept_id would be a primary key, and department_id would be the foreign key that establishes the relationship between the employee table and the department table. The integrity of the table relationship depends on the foreign key referencing a valid primary key. Most modern database management systems allow users to designate foreign key fields when tables are created, and subsequently reject operations that would result in an invalid foreign key reference. Thus, all foreign key values must have equivalent primary key values that already exist in the other table.
As is well known in the art, it is sometimes advantageous to replicate databases on other computers or servers, or even on the same computer or server. It is common, though, for a database to hold thousands of tables and millions of records, and replicating these databases can be a difficult and cumbersome task. Several software tools exist in the art to assist a database administrator with the task of replicating (also called “propagating”) databases among servers, including DpropR—a database utility developed by IBM. On the most basic level, DpropR extracts data from a first “source” database on a first server and pushes the data to a second “target” database on a second server. DpropR removes “foreign keys” from the second or target database. When supporting two-way propagation, where data is propagated in both directions between the two databases, DpropR also removes “foreign keys” from the first or source database. An improvement to DropR (see discussion of U.S. patent application Ser. No. 10/855,736 below) allows a database administrator to control the order of propagation to preserve foreign key integrity, and thereby eliminates the need to remove foreign keys.
In order to propagate databases between servers, DpropR or other database propagation utilities use “subscription sets” to determine which tables should be propagated, and to which servers. A subscription set generally comprises a list of tables that a database administrator or user treats as a single unit for purposes of database propagation. A database administrator specifies in advance what tables are included in a subscription set, and can create more than one subscription set if needed or desired.
Subscription sets are themselves typically stored in relational tables: a “subscription” table and a “member” table. A subscription table comprises records identifying a subscription set, the server having the source database to be copied, and the target database to which the source database should be copied. The subscription set may also contain other useful information, such as a value indicating the last time the source database was copied, and a value indicating a frequency for copying the source table.
A member table comprises records identifying a source table, the subscription set to which the source table belongs, and the name of the target table to which the source table should be copied. Each subscription set identified in the member table must have a corresponding record in the subscription table. Note, though, that a source table can appear in more than one subscription set.
U.S. patent application Ser. No. 10/855,736 (the '736 application) discloses a method and apparatus for propagating relational database tables that allow a database administrator to control the order of propagation, and thereby preserve foreign key integrity. The '736 application's improvement to DpropR or other database propagation utilities adds an additional field to each member table and subscription set called ORDER. The additional field enables an administrator or other user to assign a rank to each subscription set and to each table within a subscription set, and then use the rankings to determine the order in which to propagate database tables. The rankings ensure that tables with a primary key referenced by a foreign key are propagated before the table with the foreign key is propagated, avoiding an invalid foreign key reference.
One embodiment of the improvement disclosed in application Ser. No. 10/855,736 uses two programs: CAPTURE and APPLY. CAPTURE monitors a database for changes and causes APPLY to propagate source tables of the subscription sets in the designated order. Something that U.S. patent application Ser. No. 10/855,736 explicitly does not address is the situation of “cyclic foreign keys.” Cyclic foreign keys occur when a field in TABLE1 references a field in TABLE2, which in turn references a field in TABLE1. Illustrated below as:
TABLE1→TABLE2→TABLE1
Of course, cyclic references can be more complex, involving more than two tables:
TABLE1→TABLE2→TABLE3→TABLE1
An example of TABLE1 and TABLE2 with a cyclic relation are shown in FIG. 1 and FIG. 2. TABLE1 in FIG. 1 has three columns: T1_COL1, T1_COL2 and T1_COL3. The first column, T1_COL1, contains primary keys. The second column, T1_COL2 contains additional data, and the third column, T1_COL3 contains foreign keys referencing a primary key in the first column (T2_COL1) of TABLE2. Likewise, TABLE2 in FIG. 2 has three columns: T2_COL1, T2_COL2 and T2_COL3. The first column, T2_COL1, contains primary keys. The second column, T2_COL2 contains additional data, and the third column, T2_COL3 contains foreign keys referencing a primary key in the first column (T1_COL1) of TABLE1. Propagating either TABLE1 or TABLE2 to another server without propagating the other table results in an invalid foreign key reference in the third column.
Simply propagating tables in a specific order does not work with cyclic references. One of the tables has to propagate first, which inevitably results in an invalid foreign key reference unless the foreign references are manually removed from the target table to be propagated. A need exists for a system and method to automatically propagate tables containing cyclic references, without permanently removing any foreign keys.
These and other objects of the invention will be apparent to those skilled in the art from the following detailed description of a preferred embodiment of the invention.