In a data warehouse environment, tables need to be refreshed periodically with new data arriving from client systems. The new data may contain changes to existing records, i.e., rows in tables, of the database and/or new records that need to be inserted.
A data manipulation operation is defined as an operation, which modifies a data set. Examples of data manipulation operations in Structured Query Language (SQL) include UPDATE, INSERT, DELETE, and MERGE. In the context of our invention, we consider those forms of data manipulation operations where a source data set is compared with a destination data set in order to generate modifications to the latter. This can be achieved today through UPDATE, INSERT, DELETE, and MERGE statements. All these statements modify a single target data set. Such statements have been used with, for example, the Oracle 9i database system.
Another feature, provided by the SQL statement MERGE, combines a conditional INSERT, UPDATE and DELETE commands in a single atomic statement to merge data from a source to a destination. The INSERT, UPDATE, DELETE commands in the context of MERGE command are considered conditional in that (a) if a record in the new data corresponds to an item that already exists in the destination, then an UPDATE and possibly DELETE operations are performed on the item; and (b) if a record in the new data does not already exist in the destination, then an INSERT operation is performed to add a corresponding record to the destination.
Database application such as data warehouses often require data from a source structure to be merged into multiple destination structures. FIG. 10 illustrates a typical plan for a database system that merges data from a source table 1010 into multiple destination tables within the database system. The multiple destination tables are illustrated by a first destination table 1020 and a second destination table 1025. To perform the MERGE operations, a first source scan 1012 is performed on the source table 1010, and a first destination scan 1022 is performed on the first destination table 1020. The first source scan 1012 and first destination scan 1022 may be completed at time T0. Once the scans are performed, a first MERGE operation 1030 is performed to merge data from the source table 1010 into the first destination table 1020. The first MERGE operation 1030 determines, for each row being merged into the destination table, whether the row corresponds to a row that is already in the destination table.
To perform the second MERGE operation 1040, a second source scan 1014 is performed on source table 1010. A second destination scan 1024 is also performed on second destination table 1025. The second source scan 1014 and the second destination scan 1024 are completed at time T1. Once the scans are completed, the second MERGE operation 1040 is performed.
The plan of FIG. 10 illustrates the manner in which successive MERGE operations between a source data structure and other destination data structures are typically performed. Each MERGE operation requires a scan of the source data structure. This can be problematic when the source data structure is large, or otherwise be sufficiently complex to require an expensive and lengthy process to be scanned. As a result, when the source data structure is subjected to multiple MERGE operations, the individual MERGE operation can consume significant computational resources for a lengthy period of time.