In the context of database systems, a “dimension” is a list of values that provide categories for data. A dimension acts as an index for identifying values of a variable. For example, if sales data has a separate sales figure for each month, then the data has a MONTH dimension. That is, the data is organized, or “dimensioned”, by month. A dimension is similar to a key in a relational database. Data that is organized by two or more dimensions is referred to as “multidimensional data”.
Any item of data within a multidimensional variable can be uniquely and completely selected by specifying one member from each of the variable's dimensions. For example, if a sales measure is dimensioned by MONTH, PRODUCT, and MARKET, specifying “January” for the MONTH dimension, “Stereos” for the PRODUCT dimension, and “Eastern Region” for the MARKET dimension uniquely specifies a single value of a measure. A multidimensional variable (e.g., a measure) can be conceptually thought of as an N-dimensional array, where N is the number of dimensions of the variable, and where each value in the array may be accessed by specifying one dimension key value for each dimension (e.g. MDVar(dim1, dim2, dim3, . . . , dimN)). Thus, dimensions offer a concise and intuitive way of organizing and selecting data for retrieval, updating, and performing calculations.
Multidimensional arrays with 2 and 3 dimensions may be depicted visually as grids and cubes, respectively. For convenience, it has become customary to refer to the conceptual multidimensional arrays that correspond to multidimensional variables as “multidimensional cubes” (or simply “cubes”) regardless of how many dimensions they possess. Hence, a multidimensional data cube is a set of n-dimensional data objects. Further, each multidimensional value is said to belong to a “cell” of the cube, where the address of the cell is the set of dimension key values (one per dimension) that correspond to the multidimensional value contained therein.
For the purpose of explanation, the multidimensional value that belongs to a cell shall be referred to as the “cell value” of that cell. Cell values are associated with types of data, i.e., measures, on which a function is executed, such as a summation function, average function, minimum value function, maximum value function, and the like. For example, a cell might contain a value representing a summation of sales in dollars for a particular product in a particular time period in a particular market, where “sales” is the measure.
Multidimensional data may be stored in relational database management systems (“RDBMS”) or in specialized, “multidimensional” database management systems (“MDDBMS”), according to a multidimensional schema. MDDBMSs provide structures and access techniques specifically designed for multidimensional data, and therefore provide relatively efficient storage and access to multidimensional data. However, when stored in specialized MDDBMSs, only applications that are specially built to interact with those systems are able to access and manipulate the data. This is because the commands sent to the MDDBMSs have to conform to the command language supported by the MDDBMS.
When multidimensional data is stored in RDBMSs, all applications that support interaction with relational databases have access to the data. Such database applications communicate with the RDBMS by submitting commands that conform to the database language supported by the RDBMS, the most common of which is the ANSI Structured Query Language (SQL). According to one approach, multidimensional data may be stored in a MDDBMS, but accessed by sending SQL to a RDBMS. In this approach, the RDBMS interacts with the MDDBMS to access the data from the MDDBMS.
Database statements (e.g., SQL queries) often include predicates (e.g., WHERE clause conditions) that limit the domain of records that are returned in response to the query. A query optimizer executing in the RDBMS may attempt to identify predicates that can be logically applied to other parts of the query in order to reduce the amount of data that is returned. However, the relational optimizer is unable to apply such techniques in the multidimensional data space because the optimizer simply does not understand the inherent structure of, and thus the extraction process for, the multidimensional data.
The “domain” of a query or subquery on multidimensional data is defined by the dimensionally-limited subset of data objects with which the query is concerned. For example, from 32 available products in the PRODUCT dimension, a particular query may only be concerned with 5 of those products. Therefore, those 5 products represent the “domain” of the PRODUCT dimension relative to that query.
The status information maintained for a session reflects the domain that is applied to queries and subqueries executed within the session. However, a single session may have multiple queries and/or subqueries, each of which may have different domains for the same dimensions. Unfortunately, a change made to the session status caused by one query (to reflect the domain of that query) may adversely affect a concurrently executing query or subquery within the same session. Thus, concurrently executing multiple queries and subqueries within the same session may adversely affect the database system's ability to maintain read consistency throughout the session.
Based on the foregoing, there is room for improvement in techniques for efficiently and correctly processing queries on multidimensional data through a RDBMS.