A portion of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright (copyright)1999, Microsoft, Inc.
The present invention pertains generally to computer-implemented databases, and more particularly to summaries of data contained in such databases.
Online analytical processing (OLAP) is a key part of most data warehouse and business analysis systems. OLAP services provide for fast analysis of multidimensional information. For this purpose, OLAP services provide for multidimensional access and navigation of data in an intuitive and natural way, providing a global view of data that can be drilled down into particular data of interest. Speed and response time are important attributes of OLAP services that allow users to browse and analyze data online in an efficient manner. Further, OLAP services typically provide analytical tools to rank, aggregate, and calculate lead and lag indicators for the data under analysis.
A fundamental entity that is present in typical OLAP databases is a cube. A cube is a multidimensional representation of a set of data having varying aspects. A cube comprises a set of dimensions and a set of measures. In this context, a dimension is a structural attribute of the cube that is a list of members of a similar type in the user""s perception of the data. Typically, there is a hierarchy associated with the dimension. For example, a time dimension can consist of days, weeks, months, and years, while a geography dimension can consist of cities, states/provinces, and countries. Dimension members act as indices for identifying a particular cell or range of cells within a multidimensional array.
A measure is a structural attribute of the cube that comprises a particular type of value that provides detail data for particular members within the dimensions. For example, sale amounts and units sold can be measures of a retail cube having a time dimension and a geography dimension. The measures provide the sale amounts and units sold for a particular geographic region at a particular point in time.
Databases are commonly queried for summaries of data rather than individual data items. For example, a user might want to know sales data for a given period of time without regard to geographical distinctions. These types of queries are efficiently answered through the use of data tools known as aggregations. Aggregations are precomputed summaries of selected data that allow an OLAP system or a relational database to respond quickly to queries by avoiding collecting and aggregating detailed data during query execution. Without aggregations, the system would need to use the detailed data to answer these queries, resulting in potentially substantial processing delays. With aggregations, the system computes and materializes aggregations ahead of time so that when the query is submitted to the system, the appropriate summary already exists and can be sent to the user much more quickly.
Data in an OLAP system can be characterized in terms of its complexity, that is, the number of dimensions used to index the data. Thus, a complex data set is one that has many dimensions. Complex data sets have the advantage of flexibility in that users can submit more possible queries to complex data sets than to simple data sets. Accordingly, it is often desirable to use complex data sets. Increasing the complexity of a data set, however, also increases the number of pre-calculations that are required in order to maintain good performance. This is because more aggregations must be calculated ahead of time to answer the increased number of possible queries.
Because the number of required pre-calculations increases exponentially with increases in the number of dimensions, it is difficult to handle a large number of dimensions using conventional OLAP systems. Generally, conventional approaches involve striking a balance between flexibility and use of computing resources.
According to various example implementations of the invention, there is provided an efficient system for analyzing data as if the data were indexed by desired dimensions without actually creating the dimensions. These dimensions, known as virtual dimensions, are defined in relation to existing base dimensions rather than in relation to the underlying detailed data, thereby avoiding the need to perform the precalculations normally associated with creating a dimension. Significant computing effort is conserved as a result.
In one particular method implementation, a selected property value of each of a set of base dimension members is associated with a corresponding base dimension member. A set of distinct property values is determined for the set of base dimension members. For each distinct property value, an aggregate of the base dimension members that have that distinct property value is computed.
Still other implementations include computer-readable media and apparatuses for performing these methods. The above summary of the present invention is not intended to describe every implementation of the present invention. The figures and the detailed description that follow more particularly exemplify these implementations.