The present invention relates to database systems, and in particular, to transactional database systems and reporting database systems.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Business intelligence (BI) systems provide companies with extensive functionalities to gather, analyze and provide access to their data. Data is collected from multiple heterogeneous sources within a company and possibly additional external sources to create an integrated set of data as a comprehensive base of knowledge and for effective reporting.
Current state-of-the-art architectures of BI systems rely on a centralized data warehouse (DW) or multiple decentralized data marts to store the integrated data set. The process of collecting data from the transactional systems and transporting it into a dedicated storage is called extraction, transformation and loading (ETL). It “is by far the most complicated process to be designed and developed in any BI project.” [See L. T. Moss and S. Atre, Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications at page 229 (Addison-Wesley, 2003).] According to Ankorion, the ETL process is traditionally run periodically on a weekly or monthly basis. [See I. Ankorion, Change Data Capture—Efficient ETL for Real-Time BI, DM Review Magazine (January 2005).] It is usually run as a batch job during low system load windows, because transforming and cleansing data that is probably only available in poor quality takes a high amount of resources. This implies that data in the BI system is not always up-to-date, which might pose problems for companies that have to react to issues in real-time, e.g. in the banking business.
Referring to Liang and Yu, not necessarily all data is replicated into the BI system, but only data of interest. [See W. Liang and J. X. Yu, Revisit on View Maintenance in Data Warehouses, in WAIM '01: Proceedings of the Second International Conference on Advances in Web-Age Information Management at pages 203-211 (Springer-Verlag, London, UK, 2001).] Furthermore, data is usually aggregated to achieve a higher data access performance. [See K. Becker and D. D. A. Ruiz, An Aggregate-Aware Retargeting Algorithm for Multiple Fact Data Warehouses, in Yahiko Kambayashi and Mukesh K. Mohania (Wolfram Wöß, editor), DaWaK, volume 3181 of Lecture Notes in Computer Science (LNCS) at pages 118-128 (Springer-Verlag, Spain, September 2004).] In this case, aggregation levels have to be predefined. This results in some problems. Firstly, information may be queried that has not been replicated into the BI system. Secondly, the system may not able to produce certain levels of detail for a report, which has not been foreseen at the time when the aggregation levels were defined. In such a scenario ad-hoc reports—specific reports that are created and customized by the users themselves—are not entirely possible as the knowledge base is not complete, but is only a filtered version of data stored in the source systems.
While OLTP (on-line transactional processing) systems store up-to-date data, efficient reporting on top of these systems is not practicable due to performance reasons. OLAP (on-line analytical processing) systems provide sophisticated reporting capabilities, but do usually not use up-to-date data: common reporting architectures rely on complex, resource-intensive ETL (extraction, translating and loading) processes that replicate OLTP data into read-optimized data structures in a batch job fashion during low system load times.