The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section are not admitted to be prior art, merely by inclusion in this section.
Historically, there has been a distinction between relational database management systems (RDBMS) and multidimensional database management systems (MDDBMS). Each type of system has evolved somewhat independently from the other and to meet different goals. There are many general differences between such systems, including (1) database schema, (2) processes used to store, access and retrieve such data, and (3) data analysis capabilities, to name a few. Many specific differences in the internal processes associated with relational and multidimensional database systems stem from these general differences.
One significant difference between relational and multidimensional database systems is the difference in their respective transaction models and operation of associated transaction engines. Generally, relational systems and processes are modeled to provide fast and efficient loading and revising of data, whereas multidimensional systems and processes are modeled to provide fast, complex real-time analytical processing of data. Furthermore, relational systems are modeled to support many short transactions, or sequences of operations, that are almost always committed. That is, the changes made by transactions executed in a relational database system are almost always made permanent to the database. Also, typical relational systems must be able to provide read consistency for relational operations. Specifically, it is expected that a relational operation does not see any changes committed by other users, where the commit time of those changes is after the start time of the relational operation.
In contrast, multidimensional systems are modeled to support long transactions that frequently are not committed. Specifically, the changes made by transactions executed in multidimensional system are usually temporary, only visible to the transaction that is making the changes, and must be removed from the database after the transaction terminates.
Large-scale data transaction processing, which is typically associated with relational systems, is often referred to as OLTP (On-Line Transaction Processing). Real-time analytical processing of data is often referred to as OLAP (On-Line Analytical Processing), and with respect to multidimensional data, MOLAP (Multidimensional On-Line Analytical Processing).
Not only do relational and multidimensional database systems meet different needs, in addition, such systems have historically been configured as separate stand-alone systems. In some instances, such systems are configured with special interfaces that attempt allow the system to emulate the functionality of another type of system. For example, U.S. Pat. No. 6,484,179 describes techniques for organizing multidimensional data in a relational database system in a manner that improves the performance of OLAP operations within the relational database system.
However, in prior approaches, the two disparate systems are not truly integrated. Prior approaches do not provide efficient large-scale analytical processing of transactions on relationally stored data, while maintaining desired properties of such data. For example, prior approaches do not provide the ability to efficiently handle multi-session what-if analysis transactions (e.g., MOLAP transactions) on relationally managed data, while maintaining the ACID (atomicity, consistency, isolation, durability) transaction properties typically associated with relationally managed data. What-if analyses are popular scenarios in data warehouse environments, particularly for assimilation of data in a business decision-making context. A data warehouse is a consolidated view of enterprise data, optimized for reporting and analysis. Basically, a data warehouse is an aggregated, sometimes summarized copy of transaction and non-transaction data specifically structured for dynamic querying, analysis and reporting.
Database servers execute commands that conform to database language. A database language supported by most relational database servers is know as structured query language (“SQL”). A database command written in SQL may include a “MODEL clause. A MODEL clause provides numerous beneficial features, and is particularly useful in the context of integrated relational and multidimensional data. Specifically, use of the MODEL clause allows database developers to treat relational data as multidimensional arrays. Dimensionally aware calculations can then be defined on the arrays. As a result of the foregoing structure, a MODEL clause may be used to cause a database server to resolve formula dependencies automatically, therefore supporting large sets of interlinked calculations in sophisticated applications. In particular, a MODEL clause is useful in the processing of complex calculations that involve inter-row references. Before the SQL language was extended to support the MODEL clause, such complex calculations often demanded the use join and union statements, which are cumbersome and painstaking to develop.
Certain facets of how a database server can handle database commands that include the MODEL clause are described in U.S. patent application Ser. No. 10/898,304 entitled “Using a Multidimensional Database as a Surrogate for a Hash Table in the Context of a Relational Query,⇄ filed on Jul. 23, 2004, which is incorporated by reference into this patent application.
A cursor is a pointer to a data item. Typically, one cursor is associated with each query. The term “cursor isolation” refers to how a cursor runs with respect to other active transactions in the system. Specifically, if a cursor is “isolated,” the cursor runs independent of other transactions. Cursor isolation during an SQL query has not been an issue until the introduction of the SQL MODEL clause. This is because standard SQL queries do not update data, and, therefore, do not require cursor isolation. More particularly, the problem is specific to queries which use a predefined multidimensional database as a surrogate for a hash table. The use of an analytic workspace as a surrogate for a hash table was first proposed in U.S. patent application Ser. No. 10/898,304 entitled “Using a Multidimensional Database as a Surrogate for a Hash Table in the Context of a Relational Query,” filed on Jul. 23, 2004, which is mentioned above. Previously, in the context of relational databases, cursor isolation was not an issue because all data updates occurred in an in-core hash table that was created upon the execution of a query. However, if a multidimensional database is used, updates occur against a permanent data store.
The term “cursor isolation” refers to how a cursor runs with respect to other active transactions in the system. Specifically, if a cursor is “isolated,” the cursor runs independent of other transactions, and therefore cannot see the changes made in other transactions. Similar to other types of database commands, the MODEL clause specifies calculations to be done on data. However, rather than simply creating new columns to hold the results of the calculations, the operations performed by the MODEL directly replace the data that is involved in the operations.
When a query includes a clause that updates data in a multidimensional database, such as a MODEL clause, various problems arise regarding the management and protection of data. Specifically, when one user performs a query and updates data, other users that perform a query on the same data typically see the version of the data that has been updated by the first user. However, this may not be desirable. Additionally, when a user performs multiple queries on the same data, it may be desirable to isolate the data among queries so that a single item of data is not updated by multiple queries. For example, if certain data is updated during a first query, it may be desirable to have the update be transparent to a second query. In other words, the data that is returned or updated by the second query is the version of the data that existed prior to update performed during the first query. In essence, it would be desirable to have an approach for isolating cursors from each other.
Prior art approaches have been developed to provide isolation across users. One such approach involves the use of buffers to cache query results within table functions. Specifically, the output of a query, including any changes made during the query, are written into a buffer that is used only to feed an SQL engine, and then deallocated. Another such approach involves the creation of temporary multidimensional variables with read-through capabilities. However, these approaches only apply to read-only data. Additionally, these approaches are very costly in terms of the computational resources that they consume. Also, such approaches were geared toward cursor isolation in the context of relational databases, and, therefore, only dealt with updates of data in hash tables. Such solutions did not deal with updates of data in analytic workspaces, and did not provide a mechanism for updating data on a cursor by cursor (e.g., query by query) basis. The approaches cannot be used with queries that update data, such as queries that include a MODEL clause.
Therefore, with the advent of the MODEL clause, there is a great need for approaches that can provide cursor isolation in an efficient manner, so that queries associated with separate cursors do not see updates performed by queries associated with other cursors.