Modern businesses rely upon a myriad of operational systems that generate data. Examples of operational systems may include order generation systems, invoicing systems, billing systems and accounting systems. It is often desirable to move data generated by an operational system for later analysis. For example, it may be desirable to move data for transactions generated in a transaction system into a system where the data can be analyzed. At some later point in time, this data may be analyzed to examine customer trends, preferences, revenue generated by category, or other relevant information. Data visualization tools such as charting and plotting may be employed to provide additional insight into the content of the data. Systems that are utilized to analyze and evaluate data generated from operational systems are often referred to as OLAP (“Online Analytical Processing”) systems, business warehouse (“BW”) systems and/or business intelligence (“BI”) systems.
The process of performing the transfer of data from an operational system to an OLAP or BW system is often referred to as an extraction process. The term “extraction” describes the concept of retrieving data from an operational system and causing the storage of the extracted data in an OLAP or BW system. An extraction system may be deployed which, upon the generation of data in an operational system, automatically transfers the generated data from an operational system to the OLAP system. The extraction process may also perform some rudimentary transformations on the data before it is stored in the OLAP system, in order that, for example, the data is in a format suitable for processing and storage by an OLAP system. An extraction system may be part of an operational system such as a framework implemented within an operational system, or may be a separate system.
An extraction system may include a software system that operates in tandem with an operational system to perform extraction of data generated by an operational system. As just referenced, an extraction system may be a separate system from the operational system, or may be combined with the operational system. Typically, an extraction system may include management functions for defining such parameters, such as which operational systems is to be the subject of the data extraction, which data should be extracted, and how often the data extraction process should be performed.
An extraction process may perform a number of evaluations or transformations on the data generated by an operational system. The terms transformation and evaluation refer to the fact that the extraction system may process the data generated by the operational system so that it can be stored in the BW system in a convenient format. These transformations may include such processing as aggregating, combining, simplifying, filtering, conversion and any other processing of the underlying data.
Evaluations or transformation of data extracted from an operational system is often necessitated by the types of analysis that will be later performed on the data stored in an OLAP system. Often, for example, it will be desirable to analyze data in an OLAP system by querying the OLAP system utilizing any number of convenient parameters. For example, it may be desirable to examine all sales orders generated for the month of July. However, the data generated by the operational system, although it may indicate the month of the sales order, may not include a data item that aggregates all data by month. Thus, it may be convenient to store sales data in a BW system that is aggregated by month.
Although it might appear to be a relatively straightforward task to extract data generated by an operational system to a business warehouse system, there are a number of problems that may arise. The mapping between data structures in an operational system and an OLAP system is a natural source of errors because the mapping has to be defined explicitly. Due to the evaluation process described above, many errors may occur when data is transformed and stored in an OLAP system. Data generated by an operational system is often generated in a complex structured format that must be correctly interpreted by an extraction process. Errors may arise in correctly interpreting the format of the data as well as insuring that the data arrives in pristine form in the BW system. Two example types of errors that may arise are the failure to transfer a data item from an operational system to an OLAP system and generation of duplicate or redundant copies of a particular data item in an OLAP system. A third type of error relates to the accuracy or correctness for which data is replicated from an operational system in an OLAP system.
These three types of errors that may occur in data extraction from an operational system to an OLAP system may thus be characterized as concerning existence, uniqueness and correctness. With an existence error, a data element generated by an operational system is simply not transferred to an OLAP system (i.e., it fails to exist in the OLAP system). With a uniqueness error, data may be replicated or duplicated erroneously in an OLAP system (i.e., multiple copies of the same data element may be stored in the OLAP system). With a correctness error, a data element is stored in an OLAP system erroneously (i.e., the data element has been mutated from its original form or content). Still another type of error that may occur to data that is not intended to be extracted from an operational system that is, in fact, extracted.
Thus, there is a possibility for errors in the operation of extraction systems and/or processes, and such problems may be exacerbated by the heterogeneous nature of data generated by many operational systems, as well as the heterogeneous nature of format of extracted data itself. Consequently, a utility of such extraction systems may be reduced, and some benefits of the available data may be reduced or lost, as well.