A constraint is a rule that a database management system (DBMS) typically enforces. A constraint can be used to enforce business rules and are often used to optimize query performance. There are different types of constraints. Two types of constraints in particular are check constraints and referential integrity constraints.
A check constraint sets restrictions on data values that can be added to a column of a specific table. A check constraint can be defined on certain data to ensure that the data adheres to certain business rules. A check constraint ensures that data that does not conform to prescribed business rules is not mistakenly inserted or updated in a database. For example, given a check constraint defined as (salary=basepay+bonus), the database can check whether the value in the salary column for a given employee is equal to the sum of the values in the basepay and the bonus column. A referential constraint is a logical rule about values in one or more columns in one or more tables—i.e., a constraint that is defined between the foreign key and the primary key, the foreign key being in a child table and the primary key being in the parent table. in corresponding tables. The referential integrity constraint would prevent adding a row to the child table where the values in the foreign key column or columns do not exist in a row in the parent table.
Other than for enforcing business rules, database administrators are encouraged to define check constraints and referential integrity constraints because database management systems typically use such constraints to optimize performance of queries. For example, a redundant join elimination can occur based on the following scenario. Assume view V1 is defined as follows where pk and fk are the primary key and the foreign key, respectively, in a referential integrity relationship:
create view V1 as (select F.*, D.* from F, D where F.pk = D.fk)Query:select F.* from V1 where F.x ≧ ‘abc’
The query above is equivalent to: select F.* from F, D where F.pk=D.fk and F.x≧‘abc’. Internally, the query can be optimized to eliminate the join as follows: select F.* from F where F.x≧‘abc’. Here, information from the table D is not required to perform the query. In addition, based on the referential integrity relationship, the DBMS may know that there will not be any additional rows added or deleted as a result of the join of the table F to the table D. In other words, every row in the F table will join one and only one row in the D table. Accordingly, eliminating the join of D from the query will give exactly the same result with or without the join.
Check constraints and referential integrity constraints, however, are sometimes expensive to maintain, particularly when loading or deleting large amounts of data. With cross table constraint validation, when data is inserted, updated or deleted in one table, reading or updating data in the other table participating in the defined constraint is typically required. Such actions are particularly expensive especially if constraint checking involves looking at data across partitions in a partitioned environment. In a partitioned environment data needs to be sent across partitions to do the validation and this communication can be expensive.
In most database environments, the movement of data into the production system target tables needs to be done as quickly as possible to minimize the impact on the users of the target tables. The loading of data into the database tables that are used by the database users to satisfy the applications may be performed during a data load window that is typically during off peak database usage time. In order to minimize data loading time (i.e., the length of the data load window), it is necessary to find ways of speeding up the process. In a data warehouse environment, very often, bulk updates are made by first preparing the data in staging tables and then transferring the data from the staging tables to the (production system) target tables during a final load phase. It is usually critical to keep the final load phase as short as possible to allow applications have access to the newly loaded data that is consistent with minimum down time.
Accordingly, what is needed is a system and method for optimizing the performance of bulk loads or updates during critical phases of the load process, e.g., during loading of data into production system target tables. The present invention addresses such a need.