Due to the increased amounts of data being stored and processed today, operational databases are constructed, categorized, and formatted in a manner conducive for maximum throughput, access time, and storage capacity. Unfortunately, the raw data found in these operational databases often exist as rows and columns of numbers and code which appears bewildering and incomprehensible to business analysts and decision makers. Furthermore, the scope and vastness of the raw data stored in modern databases renders it harder to analyze. Hence, applications were developed in an effort to help interpret, analyze, and compile the data so that it may be readily and easily understood by a business analyst. This is accomplished by mapping, sorting, and summarizing the raw data before it is presented for display. Thereby, individuals can now interpret the data and make key decisions based thereon.
Extracting raw data from one or more operational databases and transforming it into useful information is the function of data "warehouses" and data "marts." In data warehouses and data marts, the data is structured to satisfy decision support roles rather than operational needs. Before the data is loaded into the data warehouse or data mart, the corresponding source data from an operational database is filtered to remove extraneous and erroneous records; cryptic and conflicting codes are resolved; raw data is translated into something more meaningful; and summary data that is useful for decision support, trend analysis or other end-user needs is pre-calculated. In the end, the data warehouse is comprised of an analytical database containing data useful for decision support. A data mart is similar to a data warehouse, except that it contains a subset of corporate data for a single aspect of business, such as finance, sales, inventory, or human resources. With data warehouses and data marts, useful information is retained at the disposal of the decision makers.
One of the problems associated with implementing data warehouses and data marts relates to keeping them synchronized with the changes occurring in the operational databases. Given the size and scope of the operational databases and given that there might exist numerous operational databases and many different types of datamarts, it is a monumental task for incorporating, synchronizing, and updating the changes made to an operational database so that they are appropriately reflected in the data warehouses and data marts. Furthermore, new data may be entered from many different discrete or overlapping sources. In addition, old data may be updated on a continuous or regular basis.
In the past, data warehouses and data marts were updated and maintained by reprocessing and transforming the entire data set, even though only a small portion of the operational database was actually changed. This brute-force approach of reprocessing each and every data entry is tedious, time-consuming, and highly inefficient. Alternatively, some prior art system first performed a scanning operation. The entire data set was scanned to determine whether the data is new, updated, or current. Based on the scan results, the data in the data warehouses and data marts were changed accordingly. Although this approach is slightly faster, it nevertheless is still relatively time-consuming to examine and process each and every data item. It also significantly slows down the operational database.
It is often crucial that the most recent information be made available to key individuals so that they can render informed decisions as promptly as possible. Thus, there is a need for a more efficient and faster method for responding to changes made upon an operational database and incorporating these changes to the respective data warehouses and data marts. The present invention offers a solution whereby only the changes to the operational database are captured and propagated to the data warehouses and data marts. This is made possible by the creation of a novel "Change Capture" database (CCDB) used in conjunction with a "Change Data Capture" (CDC) process to facilitate the capturing of changes and the updating and propagation of these changes to the appropriate data warehouses and data marts.