Application software products in areas like Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and Human Resource (HR) & Information Technology Service Management (ITSM) typically include inbuilt reporting & analytics capabilities. This has also been historically true for on-premise applications and for newer cloud applications. These inbuilt reporting capabilities fall in the areas of operational reporting and key performance indicator (KPI) reporting, whereas business intelligence (BI) and predictive analytics use cases are typically addressed by extraction of data into a separate analytics/data warehouse environment. Operational reporting may be based on an abstracted semantic model or may be conducted through the construction of queries directly against Online Transaction Processing (OLTP) tables. Key performance indicators (KPI) provide trending capabilities by creating snapshots of the Online Transaction Processing (OLTP) database for the key performance indicators (KPI) and a limited number of dimensions. The key performance indicators (KPI) are defined by a formula that is executed on a schedule to provide a daily history of key performance indicator (KPI) values qualified by the relevant dimensions. The key performance indicators (KPI) and dimension values are part of reports/dashboards and typically stored in a single table within the same operational database instance.
A Business Intelligence (BI) use case typically involves the creation of an atomic data warehouse with a large number of facts and dimensions and a semantic model layer of abstraction. Based on the facts, metrics, and dimensions that are pre-computed, ad hoc analysis is performed and new questions are answered in a highly efficient way. The typically Business Intelligence (BI) use case is addressed through the extraction, transformation, and loading of data using Extract Transform Load (ETL) tools into a data warehouse designed to meet specific reporting & analysis needs.
Users have struggled with the co-existence of Business Intelligence (BI), operational reporting, and key performance indicator (KPI) reporting products within the same deployment. Data conformance between operational reporting and data warehouse based reports may be problematic, and these tools often have very different user interfaces creating a confusing experience for the user. In the on-premise applications, the key performance indicator (KPI) reporting tools have often been retired once a company adopts a data warehouse based Business Intelligence (BI) solution since the BI solution provides a superset of capabilities, particularly as application vendors have purchased Business Intelligence (BI) tools (e.g. SAP with Business Objects, Oracle with Siebel Analytics, etc.). However, the coexistence challenge remains with operational reporting and Business Intelligence (BI) solutions.
A few existing systems have addressed coexistence challenge through investments in real time data warehouse capabilities, but those have been the exception rather than the rule given the cost and complexity of implementation. The coexistence challenge is even more dire when cloud applications are introduced into consideration as cloud application vendors generally do not have Business Intelligence (BI) tools or solutions. The adoption of cloud application vendors snapshotting based key performance indicator (KPI) trending tools is larger, so when they do embark on Business Intelligence (BI) solutions they confront coexistence challenges not just with the operational reporting tools but also with KPI reporting tools.
FIG. 1 illustrates a conventional warehouse based reporting system according to the prior art. The conventional warehouse based reporting system includes one or more warehouse sources 102A-N, a raw extract store 104, a staging store 106, a data warehouse 108, a result cache 110, a collection of reports and dashboards 112, and a dimensional model 114 used for describing the warehouse data with mappings to the source model. The one or more warehouse sources 102A-N provide data that needs to be analyzed. The data from one or more warehouse sources 102A-N is initially extracted and stored in the raw extract store 104. The staging store 106 is where most of the transformations like source data change identification, surrogate key generation, surrogate key replacement, and data quality correction are performed on data obtained from the raw extract store 104. The staging store 106 is connected to the data warehouse 108, which houses the data in a form which is amenable for analytics, e.g. a star schema. The process of extracting data from the one or more warehouse sources 102A-N, transforming the data, and loading the data all the way to the data warehouse 108 in a form that meets the specific reporting and analysis needs is performed using Extract Transform Load (ETL) tools.
Data warehouse 108 transfers the data to the result cache 110. Result cache 110 is an area of memory that stores the warehouse related data in query blocks such that the query blocks may be reusable. The definition of facts, measures, and dimensions for an analytical application is captured in s dimensional model 114 that is queried by the reports and dashboards 112. Dimensional model 114 is related to source models by mappings so that the source data can be transformed to a schema described by the dimensional model 114, which is typically a star schema.
FIG. 2 illustrates a conventional operational reporting system according to the prior art. The conventional operational reporting (OR) system of FIG. 2 includes an OR source 202, simple transformations 204, a result cache 206, a collection of reports and dashboards 208, and a simple source based data model 210. Operational reporting (OR) source 202 provides operational reporting related data which then goes through simple transformations 204 typically meant to map data in a source schema to data structured in a schema amenable for analytics. Result cache 206 stores the operational system related data in query blocks such that the query blocks may be reusable. The reports and dashboards 208 query the source based on the simple source based data model 210.
Data warehouses (e.g. data warehouse 108) typically refresh their stored data once or twice a day while Operational Reporting systems access live data. Supporting near real time data access with warehouse infrastructure is possible by making the Extract Transform Load (ETL) pipeline faster so that the process completes under a specified near real time latency threshold (e.g. 30 minutes) for the tables required for Operational Reporting. As only a subset of warehouse tables are needed for operational reporting, and hence need to be updated in near real time, it is quite possible to complete the ETL for those within the acceptable latency window. Note though that logically related group of tables need to be updated together to avoid any inconsistencies with selective update of tables.
For real time data access, it may not be possible to complete the Extract Transform Load (ETL) within the acceptable latency window even for a small subset of tables, hence an alternate strategy is needed. The alternate strategy may require bypassing certain expensive Extract Transform Load (ETL) steps like data quality correction and surrogate key generation/replacement. Capturing data changes in real time may be possible by very frequently polling for changed data using traditional API (e.g. ODBC or JDBC) based extractors typically used in ETL systems. A different data gathering strategy may also be employed in which changes are captured at the source in real time by effectively intercepting every data modification statement issued to the source's data store.
For warehouse based reports and operational reports to be comparable, the underlying data models need to be consistent or comparable. The approach of sharing the dimensional model 114 between the warehouse based reporting system and the operational reporting system requires an extra processing step in the operational reporting system to map from the OR source 202 to a dimensional model of the warehouse based reporting system in real or near real time. If this step is skipped for performance reasons in the operational reporting system and source tables are directly queried, an application developer discipline is applied to ensure the same wording for measures, facts, and dimensions in the two systems mean the same thing because they are computed in the same way from the source elements.
Even if the model definitions of the warehouse based reporting system and the operational reporting system are the same and the same data is being extracted for the same measures in the operational reporting system and warehouse based reporting system, data discrepancies are still possible in the two systems as data in the operational reporting system and warehouse based reporting system is fetched at different frequencies and data in the warehouse systems goes through additional transforms. The drawback of fetching data at different frequencies is that the value of measures (e.g. the number of active incidents) in the operational system is unlikely to match that in the warehouse system because recent source data changes may have been picked up by the operational reporting system but not by the warehouse based reporting system.