A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright (copyright) 1999, Microsoft, Inc.
The present invention pertains generally to computer-implemented databases, and more particularly to storing records in such databases.
Online analytical processing (OLAP) is a key part of most data warehouse and business analysis systems. OLAP services provide for fast analysis of multidimensional information. For this purpose, OLAP services provide for multidimensional access and navigation of data in an intuitive and natural way, providing a global view of data that can be drilled down into particular data of interest. Speed and response time are important attributes of OLAP services that allow users to browse and analyze data online in an efficient manner. Further, OLAP services typically provide analytical tools to rank, aggregate, and calculate lead and lag indicators for the data under analysis.
In this context, a dimension is a structural attribute of a cube that is a list of members of a similar type in the user""s perception of the data. Typically, there is a hierarchy associated with the dimension. For example, a time dimension can consist of days, weeks, months, and years, while a geography dimension can consist of cities, states/provinces, and countries. Dimension members act as indices for identifying a particular cell or range of cells within a multidimensional array. Each cell contains a value, also referred to as a measurement.
One issue regarding the design of multidimensional databases is how to represent the cells in the multidimensional space. One potential design choice is to represent the multidimensional space as an array of cells, with the size of the array determined by the multiplication of the number of points in each dimension. A significant problem with this approach is that the size of the database grows exponentially as the number of dimensions and the size of each dimension increases. This leads to a rapid depletion of the physical resources such as persistent storage and RAM required to implement the database. This phenomenon is known as data explosion for multidimensional databases.
In addition, much of the space is wasted in the above-mentioned approach. Data in multidimensional databases tends to be sparse, that is, not every cell is expected to have a value associated with it. For example, consider a Store dimension having a hierarchy of Country, State, and City specifying the location of a store, and a Product dimension having a product identification and a product count measure. No store in the in the database will be expected to stock every possible product, and in fact any one store may only stock 20% of the available products. In this situation, most of the cells in the multidimensional space would contain no data, thus wasting much of the space allocated to the database.
A second issue relates to locating cells in the multidimensional space. It is desirable to be able to locate cells quickly in order to provide acceptable system throughput. Representing the cells as a multidimensional array provides for rapid access to the cells, but has the data explosion problem mentioned above.
A third issue relates to the capability to perform aggregations on the multidimensional data. Databases are commonly queried for aggregations (e.g. summaries, minimums, maximums, counts etc.) of detail data rather than individual data items. For example, a user might want to know sales data for a given period of time without regard to geographical distinctions. These types of queries are efficiently answered through aggregations. Aggregations are precomputed summaries of selected detail data that allow an OLAP system or a relational database to respond quickly to queries by avoiding collecting and aggregating detailed data during query execution. Without aggregations, the system would need to scan all of the rows containing the detailed data to answer these queries, resulting in potentially substantial processing delays. With aggregations, the system computes and materializes aggregations ahead of time so that when the query is submitted to the system, the appropriate summary already exists and can be sent to the user much more quickly. Calculating these aggregations, however, can be costly, both in terms of processing time and in terms of disk space consumed.
A third issue relates to the stability of the members of a dimension hierarchy. Stability refers to the propensity for members to move from one point in the dimension hierarchy to another. Some types of dimensions are very stable. For example, hierarchies in the time dimension are very stable, as there is no need to move a month from one year to the next.
However, other hierarchies tend to be much less stable, and members frequently move from point to point in the dimension hierarchy. As an example, consider a customer dimension having a hierarchy of Country, State, City, Customer, where a customer is located in a particular city of a particular state within a particular country. It is quite likely that at some point in time, a customer will move from one city to another, possibly in a different state, and perhaps to a different country.
In previous systems, the movement of a member from one point in a hierarchy to another point results in the entire OLAP database having to be rebuilt to reflect the new hierarchy. Completely rebuilding the database typically takes a large amount of time and system resources, especially for OLAP databases with large numbers of detail records.
Thus there is a need for a system that stores cell data for a multidimensional database in an efficient manner. Additionally, there is a need for such a system that allows members to move from one point to another without causing the database to be rebuilt. Moreover, there is a need for such a system that provides the ability to locate cells rapidly and efficiently. Finally, there is a need for such a system that is able to perform aggregations in an efficient manner.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
The systems and methods described herein create and maintain cell data records in an OLAP database system. One aspect of the system is that cell data records are created that contain a system path. The system path is comprised of one or more dimension paths that define the location of a cell in a multidimensional database. The dimension path can be a flexible dimension path or it can be a rigid dimension path. Flexible dimension paths map a unique member id to a rigid dimension path maintained outside of the system path. This allows a member to move from one location to another in the dimension hierarchy without changing the system path, thereby avoiding a rebuild of the OLAP database. Rigid dimension paths map directly to a member of a level in the dimension hierarchy. The format used for the dimension paths provides an efficient mechanism for locating the cell, and in addition, can be indexed easily to allow rapid location of cell data.
A further aspect of the system is that the format of the system path provides an efficient mechanism for creating aggregations. Flexible dimension paths that are in the dimension levels that are to be aggregated are converted to rigid dimension paths. The rigid dimension paths have their corresponding member index set to a null value in the dimension path of each record. The records are then scanned for a match to a system path representing the aggregation. Those that match have their measure data included in the aggregation.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.