The present invention relates generally to database operations and management and, more specifically, to a MERGE DELETE statement for database queries.
In a data warehouse environment, tables need to be refreshed periodically with new data arriving from client systems. The new data may contain changes to existing records, i.e., rows in tables, of the warehouse and/or new records that need to be inserted.
Historically, the update and insert operations associated with a data refresh event were expressed according to one of two approaches: either (1) as a sequence of DMLs (INSERT and UPDATE operations) or (2) as PL/SQL loops that determine, for each record, whether to insert or update data. Both approaches face performance obstacles. The first approach requires four table scans and two table joins. That is, each of the source and destination table is scanned and the two tables joined for each of two query clauses [e.g., (1) WHERE CdeSt IN; and (2) WHERE Cdest NOT IN]. The second approach operates on a perrecord basis.
The Oracle 9i database system introduced a database server feature that addresses needs associated with data Extraction, Transformation, and Loading (ETL), which are often encountered in the context of data warehousing. That feature is the SQL statement MERGE, which combines the sequence of conditional INSERT and UPDATE commands in a single atomic statement to merge data from a source to a destination or target (sometimes referred to as Upsert functionality). The INSERT and UPDATE commands are considered conditional in that (a) if a record in the new data corresponds to an item that already exists in the destination, then an UPDATE operation is performed on the item; and (b) if a record in the new data does not already exist in the destination, then an INSERT operation is performed to add a corresponding record to the destination.
The following is an example of the MERGE statement, in the context of a periodic update to a fact (destination) table, SALES_FACT, based on sales data coming from on-line systems (source). Further, when a new store is opened, data from the new store (source) needs to be inserted into the SALES_FACT table.
MERGE INTO SALES_FACT D
USING SALES_JUL01 S
ON (D.TIME_ID=S.TIME_ID
AND D.STORE_ID=S.STORE_ID
AND D.REGION_ID=S.REGION_ID)
WHEN MATCHED THEN
UPDATE
SET d_parts=d_parts+s_parts
d_sales_amt=d_sales_amt+s_sales_amt,
d_tax amt=d_tax_amt+s_tax amt,
d_discount_amt=d_discount_amt+s_discount_amt
WHEN NOT MATCHED THEN
INSERT (D TIME_ID, D.STORE_ID, D.REGION_ID, D.PARTS_ID, D.SALES_AMT, D.TAX_AMT, D.DISCOUNT)
VALUES ( S.TIME_ID, S.STORE_ID, S.REGION_ID, S.PARTS_ID, S.SALES_AMT, S.TAX_AMT, S.DISCOUNT).
Using the MERGE command, the conditional INSERT or UPDATE is processed based on a single SQL statement. However, multiple table scans are required to issue the appropriate DML commands against the destination data, which is then stored persistently, such as on disk.
Often, it is desirable to cleanse the destination containers (e.g., data tables) of unneeded records as part of the overall process of populating or updating them using the MERGE statement. In order to perform a cleansing of the destination tables; a separate DELETE statement is executed after the MERGE statement has finished executing. The separate DELETE statement requires at least one additional scan of the destination table to check the condition of each relevant row and to remove rows which meet the condition. In addition, the separate DELETE operation requires additional probes into shared memory as well as additional disk I/O operations.
Based on the foregoing, there is a clear need for an improved technique for cleansing a data table in conjunction with a data merging process.