The invention relates generally to computer database systems and more particularly to referential constraint processing during database load operations. The subject matter of the invention is generally related to the following jointly owned and co-pending patent application: “Cascade Delete Processing” by Christopher Y. Blaicher, Kerry C. Tenberg and Randol K. Bright (Ser. No. 10/999,999) which is incorporated herein by reference in its entirety.
Virtually all modern DataBase Management Systems (“DBMS”) provide mechanisms that permit users to constrain the value of one database entity based on the value or existence of another database entity. One common constraint type is the referential constraint. Referential constraints require that a value referred to by one database entity (e.g., a row in a first table) is associated with an existing entity in the database (e.g., another row in the same or different table). In the context of the Structured Query Language (“SQL”), referential constraints are implemented through the use of Foreign Keys (“FK”), wherein a database entity's FK value must equate to the Primary Key (“PK”) value of another, existing, database entity.
In general, constraint processing is preformed during database update and load operations and may be handled in accordance with one of three ways or policies. In the first, deletion of a referenced entity is prohibited. This policy often referred to as the “Reject Violating Modifications” policy. In the second, if a referenced entity is deleted or determined to be invalid then all entities that reference it are also deleted (or marked invalid). This policy is often referred to as the “Cascading” policy. In the third, FK values referencing a deleted or invalid PK value are set to NULL. This policy is often referred to as the “Set-Null” policy.
In the context of a relational DBMS, FIG. 1 shows prior art load-time referential constraint processing operation 100 as it relates to loading two related tables—the first table a “parent” table and the second table a “child” table to the first table through a referential constraint relationship. Parent table data is loaded (block 105) and the table's PK index is built or loaded (block 110). Next, child table data is loaded (block 115) and the table's FK index is built or loaded (block 120). It will be recognized by those of ordinary skill in the art that data (i.e., rows) marked for deletion but stored externally are not typically loaded during the acts of blocks 105 and 115. Accordingly, PK and FK indexes do not incorporate references to “deleted” row data. Once the tables (data and indexes) are loaded, referential constraint processing for the child table may be performed (blocks 125-145).
Constraint processing begins by obtaining a first row of the child table and identifying the row's FK as it relates to the parent table (block 125). The FK so obtained is used to probe the parent's PK index (block 130). If the parent's PK index does not have an entry corresponding to the child's FK value (the “No” prong of diamond 135), the FK fails to satisfy the referential integrity check and the child's row is marked for deletion (block 140). If the parent's PK index does not have an entry corresponding to the child's FK value (the “Yes” prong of diamond 135), the FK satisfies the referential integrity check. If child data remains to be processed in accordance with blocks 125-140 (the “No” prong of diamond 145), processing continues at block 125 where the “next” row of data from the child table is obtained. If no more child data remains to be processed (the “Yes” prong of diamond 145), the load operation is completed by removing child data rows marked for deletion in accordance with blocks 125-145 (block 150). Mathematically, the time required to perform load-time referential constraint processing in accordance with FIG. 1 can be expressed as follows:T(load)=T(parent)+T(child)+[T(probe)×N],   EQ. 1where T(load) represents the total load time, T(parent) the time to load the parent table (data and PK index), T(child) the time to load the child table (data and FK index), T(probe) the time required to probe the parent's PK index and N represents the number of probes into the parent table's PK index required.
It will be recognized by those of ordinary skill in the art that the act of probing (block 130) can consume a significant amount of time. One reason for this is that indexes are typically implemented using B-tree structures and, more typically, B+ tree structures. For large tables, the very act of sequentially retrieving (probing) a large number of key values can become a significant portion of the total time needed to load the targeted tables. Thus, to provide improved load-time characteristics of database management systems it would be beneficial to provide techniques (methods and devices) to significantly reduce the time required to load and referentially verify database entities.