The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art, merely by inclusion in this section.
Historically, there has been a distinction between relational database management systems (RDBMS) and multidimensional database management systems (MDDBMS). Each type of system has evolved somewhat independently from the other and to meet different goals. There are many general differences between such systems, including (1) database schema, (2) processes used to store, access and retrieve such data, and (3) data analysis capabilities, to name a few. Many specific differences in the internal processes associated with relational and multidimensional database systems stem from these general differences.
One significant difference between relational and multidimensional database systems is the difference in their respective transaction models and operation of associated transaction engines. Generally, relational systems and processes are modeled to provide fast and efficient loading and revising of data, whereas multidimensional systems and processes are modeled to provide fast, complex real-time analytical processing of data. Furthermore, relational systems are modeled to support many short transactions, or sequences of operations, that are almost always committed. That is, the changes made by transactions executed in a relational database system are almost always made permanent to the database.
In contrast, multidimensional systems are modeled to support long transactions that frequently are not committed. This is, the changes made by transactions executed in multidimensional system are usually temporary, only visible to the transaction that is making the changes, and must be removed from the database after the transaction terminates.
Not only do relational and multidimensional database systems meet different needs, in addition, such systems have historically been configured as separate stand-alone systems. In some instances, such systems are configured to emulate the functionality of one type of system in the other type of system. For example, U.S. Pat. No. 6,484,179 describes techniques for organizing multidimensional data in a relational database system in a manner that improves the performance of OLAP operations within the relational database system.
However, in prior approaches, the two disparate systems are not truly integrated. Prior approaches do not provide efficient large-scale analytical processing of transactions on relationally stored data, while maintaining desired properties of such data. For example, prior approaches do not provide the ability to efficiently handle multi-session what-if analysis transactions (e.g., MOLAP transactions) on relationally managed data, while maintaining the ACID (atomicity, consistency, isolation, durability) transaction properties typically associated with relationally managed data. What-if analyses are popular scenarios in data warehouse environments, particularly for assimilation of data in a business decision-making context. A data warehouse is a consolidated view of enterprise data, optimized for reporting and analysis. Basically, a data warehouse is an aggregated, sometimes summarized copy of transaction and non-transaction data specifically structured for dynamic querying, analysis and reporting.
A MODEL clause in a structured query language (“SQL”) provides numerous beneficial features, and is particularly useful in the context of integrated relational and multidimensional data. Techniques related to the MODEL clause are described in detail U.S. patent application Ser. No. 09/886,839, entitled “PERFORMING SPREADSHEET-LIKE CALCULATIONS IN A DATABASE SYSTEM”, filed on Jun. 20, 2001, the entire contents of which are incorporated herein.
A MODEL clause allows database developers to treat relational data as multidimensional arrays. The data that is acted on by a MODEL clause may be referred to a “model data”. Formulas can then be defined on the arrays. As a result of the foregoing structure, a MODEL clause may resolve formula dependencies automatically, therefore supporting large sets of interlinked formulas in sophisticated applications. In particular, a MODEL clause is useful in the processing of complex calculations that involve inter-row references. The following example of a MODEL clause predicts sales of ‘bike’ in 2000 to be sum of its sales in 1999 and 1998 and sales of car in 2000 to be an average of years 1990 through 1994:
MODEL PARTITION BY (country) DIMENSION BY (prod, year)MEASURES (sales)( sales[‘bike’, 2000] = sales[‘bike’, 1999] + sales[‘bike’, 1998], sales[‘car’, 2000] = avg(sales)[‘car’, for year in (1990,1991,1992, 993,1994)])
Prior to the advent of the MODEL clause, such complex calculations often demanded the use of multiple data transformations, such as join and union statements, which are cumbersome and painstaking to develop.
In order to perform a relational query with a MODEL clause, random access of the data in the OLAP table is required. However, the data in the OLAP table cannot be randomly accessed. One solution is to create a hash table at runtime, and populate the hash table with OLAP table data. Once the data is stored in the hash table, it may be randomly accessed. A relational engine then evaluates the data in the hash table by applying spreadsheet functions. Spreadsheet functions include functions in which calculations, such as summation and averaging, are performed on data residing in rows, columns and/or other dimensions of a database.
However, various costs are associated with the creation of the hash table “on the fly” at run time. Additionally, some aspects of the hash table performance are undesirable. In particular, the creation of the hash table at run time is a resource intensive operation that consumes a great deal of time, memory and disk space. Also, in the event that the source data to be stored in the hash table is in an analytic workspace, the time to retrieve the data, and to build and populate the hash table is even greater.
One solution for decreasing the time consumed in the creation of the hash table at run time is to increase the size parameters of the hash table. However, this solution is not optimal because a great deal of memory must be employed if the parameters of the hash table are increased. Additionally, as a result of over-allocation of memory for a hash table, over-fetching may occur, which is not desirable. Over-fetching occurs when an area of memory is retrieved, but only a portion of the area is actually needed.
Additionally, due to the structure of a hash table, there are certain inefficiencies in the accessing and retrieval of data from the hash table. These inefficiencies can be rectified to a certain extent by optimization procedures to be performed on the hash table. However, the optimization procedures are costly to apply, and painstaking to develop.
In the implementation of the MODEL clause, as well as in the context of implementing other SQL statements and clauses, it would be desirable to have a configuration or mechanism that avoids the costs associated with creating and using a hash table at run time.