Historically, there has been a distinction between relational database management systems (RDBMS) and multidimensional database management systems (MDDBMS). Each type of system has evolved somewhat independently from the other and to meet different goals. There are many general differences between such systems, including (1) database schema, (2) processes used to store, access and retrieve such data, and (3) data analysis capabilities, to name a few. Many specific differences in the internal processes associated with relational and multidimensional database systems stem from these general differences.
Relational Database Management Systems
Traditional relational database management systems are configured to support relatively many, short transactions. A transaction is a sequence of operations on objects or data items, often embodied in DML (Data Manipulation Language) commands, that satisfies the following four properties (ACID properties):                Atomicity: all the operations of the transaction are performed, or none of them is performed;        Consistency: a transaction must be correct, by transforming the data from one consistent state to another;        Isolation: when several transactions are executed concurrently, each of them must appear as if it is executed alone;        Durability: despite crashes, the effect of the performed operations is permanent.        
Due to the way in which database updates are performed to data files in some database systems, at any given point in time, a data file in persistent storage may contain some data blocks that (1) have been tentatively modified by uncommitted transactions and/or (2) do not yet reflect updates performed by committed transactions. Thus, an instance recovery operation must be performed after an instance failure to restore a database to the transaction consistent state it possessed just prior to the instance failure. In a transaction consistent state, a database reflects all the changes made by transactions which are committed and none of the changes made by transactions which are not committed.
Consistency Property
Not only must transactions leave the database in a consistent state upon completion, but transactions must also see the database in a consistent state while they are executing. This condition is difficult to satisfy because there may be concurrently executing transactions, each of which may be causing a temporary inconsistency visible only within that single transaction. Consequently, transactions must not be able to see the changes made by concurrently executing transactions until those transactions commit (i.e. terminate and make their changes permanent).
Relational systems make changes to data as commanded, under the presumption that the changes will eventually be committed. If a failure occurs in the system or the transaction cannot be committed, or made permanent, the system will roll back or undo the changes. That is, all modifications performed up to this point will be undone and the system will revert to its previous state at the beginning of the transaction. Hence, lengthy transactions in a relational system become increasingly expensive due to the possible increasing amount of undo information that is managed over the time of the transaction. Furthermore, during a transaction it is typical for the associated table rows to be write locked until committed or rolled back, prohibiting other sessions from writing to, but not reading from, the row during the transaction time.
With a pending consistent transaction, a query on associated data is relative to the start time of the transaction. Thus, during query processing, one can make changes to data without affecting the query. This concept is referred to as “consistent read.” One technique for performing consistent reads is described in U.S. Pat. No. 5,832,521 entitled “Method and Apparatus for Performing Consistent Reads in Multiple-Server Environments”, which is incorporated by reference in its entirety for all purposes as if fully disclosed herein.
One approach to ensuring that database readers do not see changes made by concurrently executing transactions involves presenting snapshots of the database to readers. According to the snapshot approach, every version of a data item supplied to a reader transaction belongs to a “snapshot” of the database. A snapshot includes all updates that were committed to the database as of a given time and any updates made by the reader transaction itself, but no updates made by other transactions that were not committed as of that given time. If no such version of a data item actually exists, the version must be derived from an existing version of the data item. One implementation of a snapshot approach is described in U.S. Pat. No. 5,870,758, entitled “Method and Apparatus for Providing Isolation Levels in a Database System”, which is incorporated by reference in its entirety for all purposes as if fully disclosed herein.
Undo in a Relational Database System
To derive a version of a data item, the derivation mechanism requires an existing version of the data item and information required to derive the desired version of the data item from the existing version of the data item. The information required to derive a particular version of a data item is hereinafter referred to as “derivation data”. In a database system that uses snapshots, the derivation data may include, in addition to an existing version of a data item, transaction state information of the transactions that have updated the data item, and information from undo records associated with the data item.
For example, assume that a data item DATA1 has been updated by three transactions TXA, TXB, and TXC, in that order. TXA and TXB have committed before a time T, and TXC did not commit until time T+1. When a fourth transaction TXD associated with the snapshot time T wishes to read DATA1, the version of DATA1 that reflects all of the updates made to DATA1 (the “current version”) cannot be supplied to TXD because the current version of DATA1 reflects changes made by a transaction TXC that committed after time T. However, the changes made by TXC may be removed from DATA1 to produce a “derived” version of DATA1, which may then be supplied to TXD.
Undo records are records that indicate how to undo the changes that were made by transactions during update operations that correspond to the undo records. A derivation mechanism applies the appropriate undo records to the current version of persistent data to generate the desired, derived version of the data. The derived version of the data is typically stored in cache, from which the database server can read the derived version of the data.
Redo in a Relational Database System
In a database server, a memory area is allocated and one or more processes are started to execute one or more transactions. The combination of background system processes and the processes executing transactions is called a database instance.
A buffer cache resides in a portion of the memory area and holds database information. Buffers in the buffer cache hold copies of data blocks that have been read from data files. The buffers are shared by all user processes concurrently connected to the instance. When a transaction desires to make a change to a data block, a copy of the data block is loaded into a buffer and the change is made to the copy of the data block stored in the database buffer cache in dynamic memory. Afterwards, a database writer writes the modified blocks of data from the database buffer cache to the data files on disk.
The memory area also contains a redo log buffer. A redo log buffer holds information about update operations recently performed by transactions. This information is stored in redo entries. Redo entries contain the information necessary to reconstruct, or redo, changes made by operations such as INSERT, UPDATE, DELETE, CREATE, ALTER, or DROP operations for example. Redo entries are generated for each change made to a copy of a data block stored in the database buffer cache. The redo log buffer is written to an active online redo log file group on disk by a background process. The records in the online redo log file group on disk are referred to as redo logs.
A typical DBMS performs several steps during an instance recovery. First, the DBMS rolls forward, or reapplies to the data files all of the changes recorded in the redo log. Rolling forward proceeds through as many redo log files as necessary to bring the database forward in time to reflect all of the changes made prior to the time of the crash. Rolling forward usually includes applying the changes in online redo log files, and may also include applying changes recorded in archived redo log files (online redo files which are archived before being reused). After rolling forward, the data blocks contain all committed changes as well as any uncommitted changes that were recorded in the redo log prior to the crash.
The use of redo logs during recovery are described in detail in U.S. Pat. No. 5,832,516 entitled “Caching Data In Recoverable Objects”, which is incorporated by reference in its entirety for all purposes as if fully disclosed herein.
Rollback segments include records, e.g., undo records, for undoing uncommitted changes made during the roll-forward operation. As mentioned, in database recovery, the information contained in the rollback segments is used to undo the changes made by transactions that were uncommitted at the time of the crash. The process of undoing changes made by the uncommitted transactions is referred to as “rolling back” the transactions.
Multidimensional Database Management Systems
As mentioned, traditional multidimensional database management systems, sometimes referred to as OLAP systems, are fundamentally different than a RDBMS. A primary feature of multidimensional systems is the ability to perform predictive analytical functions such as forecasting, modeling, allocations and consolidations, and scenario management on multidimensional data. Such systems are built on the presumption of long transactions involving iterations of data change and examination.
A useful MDDBMS guarantees read repeatability to the beginning of a given session and allows for long sessions that incur limited performance degradation. Read repeatability ensures that one only sees the effect of private changes to data, not the effect of changes of others. Consequently, running a MDDBMS in a multi-threaded computing environment has its challenges. Approaches to such challenges are described in U.S. Pat. No. 6,078,994 entitled “System for Maintaining a Shared Cache in a Multi-Threaded Computer Environment” and U.S. Pat. No. 6,324,623 entitled “Computing System for Implementing a Shared Cache”, both of which are incorporated by reference in their entirety for all purposes as if fully disclosed herein.
OLTP and OLAP Operations
One significant difference between relational and multidimensional database systems is the difference in their respective transaction models and operation of associated transaction engines. Generally, relational systems and processes are modeled to provide fast and efficient loading and revising of data, whereas multidimensional systems and processes are modeled to provide fast, complex real-time analytical processing of data. Furthermore, relational systems are modeled to support many short transactions, or sequences of operations, that are almost always committed. That is, the changes made by transactions executed in a relational database system are almost always made permanent to the database.
In contrast, multidimensional systems are modeled to support long transactions that frequently are not committed. This is, the changes made by transactions executed in multidimensional system are usually temporary, only visible to the transaction that is making the changes, and must be removed from the database after the transaction terminates.
Large-scale data transaction processing, which is typically associated with relational systems, is often referred to as OLTP (On-Line Transaction Processing). Real-time analytical processing of data is often referred to as OLAP (On-Line Analytical Processing), and with respect to multidimensional data, MOLAP (Multidimensional On-Line Analytical Processing).
Not only do relational and multidimensional database systems meet different needs, in addition, such systems have historically been configured as separate stand-alone systems. In some instances, such systems are configured with interface data constructs and processes in attempts to emulate the functionality of one type of system in the other type of system. For example, U.S. Pat. No. 6,484,179 describes techniques for organizing multidimensional data in a relational database system in a manner that improves the performance of OLAP operations within the relational database system.
However, in prior approaches, the two disparate systems are not truly integrated. Prior approaches do not provide efficient large-scale analytical processing of transactions on relationally stored data, while maintaining desired properties of such data. For example, prior approaches do not provide the ability to efficiently handle multi-session what-if analysis transactions (e.g., MOLAP transactions) on relationally managed data, while maintaining the ACID (atomicity, consistency, isolation, durability) transaction properties typically associated with relationally managed data. What-if analyses are popular scenarios in data warehouse environments, particularly for assimilation of data in a business decision-making context. A data warehouse is a consolidated view of enterprise data, optimized for reporting and analysis. Basically, a data warehouse is an aggregated, sometimes summarized copy of transaction and non-transaction data specifically structured for dynamic querying, analysis and reporting.
Based on the foregoing, there is a clear need for techniques for handling what-if analysis of data managed in a relational database system.