The present invention relates to active data. More particularly, the present invention relates to methods and apparatus for efficiently updating and employing active data in for example business intelligence systems.
Relational databases have long been employed to archive mission-critical data. Relational databases offer reliable storage, with vendors supporting architectures that provide redundancy to ensure uninterrupted service and no loss of data. Relational databases also support the Structured Query Language (SQL) that allows user to store, retrieve and/or modify data in a relatively straight forward manner. Most Relational Database Management System (RDBMS) vendors also support the concept of triggers. A trigger may be thought of as a notification mechanism that is activated when data is changed in the database. Many user applications include the trigger feature, thereby making relational databases a popular choice for data archival among application developers.
Active data, however, poses a significant challenge for software developers, and particularly for relational database developers, due to the sheer volume of data involved. Active data may be defined as data that changes on short or regular interval. The duration of the interval may be as short as a millisecond, for example. The opposite of active data is static data: data that does not change on a regular basis or data that changes with long periodicity. Active data is also referred to as real-time data. Examples of active data may include, for example, the sales transactions in a store, the exchange rate of the US dollar versus other currencies like the Euro or the Canadian dollar, and the like.
The volume involved when dealing with active data can be daunting from both a historical perspective (i.e., the historical records of change in values represented by a variable) and a runtime perspective (i.e., updating the database to keep up with the pace of change). By way of example, consider a typical large company stock on a major trading board. The stock may be held by millions of customers, many of whom depend on the up-to-the-minute stock price data for their trading decisions. Thus, a change in the value of such a stock may require millions of updates to the portfolios of millions of customers per second.
Relational database developers have long relied on data normalization as a technique to optimize the task of updating data for short and frequent transactions, such as the task of updating the aforementioned change in a stock price. For these transactional or operational systems, the goal is to reduce the duplication of data among tables. Upon being normalized, the data of a database is isolated in a large number of tables, but a given data value is kept in only one or a few tables so that fewer tables will require updating when a change occurs.
Using the aforementioned stock price updating example, a relational database developer may normalize the data tables such that only one or two source tables need to be updated when the stock price changes. Using primary key-foreign key specifications, other tables (such as the millions of tables representing the portfolios of individual customers) may be provided with a reference to the appropriate field(s) in the source table(s). Accordingly, there is no need to update the millions of tables representing the portfolios of individual customers when the price of a stock changes, which could happen from second to second. The concepts of primary key and foreign key are extremely well known in the relational database art and will not be further explained here for brevity's sake.
Data normalization, however, vastly reduces the efficiency of another class of applications, known as business intelligence applications or systems. In business intelligence applications (which includes analytical applications and/or reporting applications for the purpose of the present disclosure), the database is employed as a support tool, or a data depository, to answer business questions (e.g., how many widgets were sold by stores A, B, and C on January 10th?). Since data normalization isolates data in a large number of tables to reduce data duplication among tables, such analysis requires traversing the multitudes of primary key-foreign key relationships of a normalized database in order to materialize the data in the various required data tables for analysis.
Such traversal unfortunately are complex and hence takes a large amount of time, and often detrimentally affects the performance of a business intelligence system. For analysts and decision makers who require real-time reporting and/or analysis of active data, the latency involved in traversing the normalized database may be intolerable since the reporting and/or analysis based on the normalized tables may not be able to keep up with the pace of change in the data values.
At the opposite extreme is the de-normalized approach, where the data of the database is put back into fewer tables after normalization to support querying and analysis. Although a given data item may be repeated in many different tables, the user application has to deal with fewer tables during querying and/or analysis, and performance is substantially improved. In addition, asking business questions are easier since the traversals are less complex. The de-normalized approach, however, works poorly with active data since updating is complicated by the fact that a large number of tables needs to be updated for each change in a given data item.
In view of the foregoing, there are desired improved techniques and arrangements for efficiently supporting both the large number of updates required by the active data and the real-time reporting and/or analysis based on the active data.