Many businesses and other large organizations today use relational database management systems known as on-line transaction processing (OLTP) systems to execute and keep track of business transactions. For example, a company that sells products or services over the Internet may use an OLTP system to record pricing information about each product for sale, billing and shipping information for each purchaser, and sales information for each order made by a purchaser. Other examples of businesses that use OLTP systems include airlines, banks, mail-order companies, supermarkets, and manufacturers.
The data generated and recorded in OLTP systems are valuable to most businesses, because the businesses can aggregate and analyze the data to ascertain the product sales for a particular month, forecast changing trends in product popularity and identify profitable or unprofitable product lines, or otherwise evaluate the businesses' affairs. Aggregating and analyzing this data, however, is computationally expensive and, if performed on the OLTP system itself, would decrease the performance of the OLTP system. Accordingly, it has become common for businesses with OLTP systems to set up a separate computer system, generally known as a “data warehouse,” for the purpose of collecting, aggregating, and analyzing the information contained in the OLTP databases. Data warehouses can grow very large, ranging from gigabytes to many terabytes of data (trillions of bytes). The task of moving data from its original source in OLTP systems to the data warehouse is commonly referred to as data extraction, transport, and loading (ETL).
Conventional data extraction, transport, and loading mechanisms are cumbersome. In a typical approach, database administrators generally dump the entire contents of the tables in the OLTP system into flat files, transport the flat files to a staging system, and then load the data in the flat files into the data warehouse. In this approach, the amount of data extracted, transported, and loaded is as immense as the amount of data in the OLTP system, even though only a small percentage of the data on the OLTP system is actually new. Accordingly, there has been much interest in devising ways to reduce the amount of data extracted, transported, and loaded by capturing only the changed data to the database tables of the OLTP system.
Lately, there has been interest in an asynchronous change capture approach for extracting, transporting, and loading data from an OLTP system to a data warehouse. In one asynchronous approach, change data in the form of inserted, updated, and deleted records are captured in recovery logs that are generated by the OLTP system in the normal course of operation. These recovery logs are shipped to the data warehouse, where the change data stored in the recovery logs can be extracted into change tables on the data warehouse without affecting the performance of the OLTP system. Each change table corresponds a source table on the OLTP system and can be used to track historical business data and provide a history of the system by identifying which records in the corresponding source table were updated.
However, there is still a need for high-performance data extracting, transport, and loading on the data warehouse side. For example, in one implementation of an asynchronous change capture needed about 8 hours and 11 minutes to extract changes from a recovery log into 12 different change tables.