The present invention pertains generally to computer-implemented databases, and more particularly to storing records in such databases.
Online analytical processing (OLAP) is an integral part of most data warehouse and business analysis systems. OLAP services provide for fast analysis of multidimensional information. For this purpose, OLAP services provide for multidimensional access and navigation of the data in an intuitive and natural way, providing a global view of data that can be xe2x80x9cdrilled downxe2x80x9d into particular data of interest. Speed and response time are important attributes of OLAP services that allow users to browse and analyze data online in an efficient manner. Further, OLAP services typically provide analytical tools to rank, aggregate, and calculate lead and lag indicators for the data under analysis.
In OLAP, information is viewed conceptually as cubes, consisting of dimensions, levels, and measures. In this context, a dimension is a structural attribute of a cube that is a list of members of a similar type in the user""s perception of the data. Typically, there are hierarchy levels associated with each dimension. For example, a time dimension may have hierarchical levels consisting of days, weeks, months, and years, while a geography dimension may have levels of cities, states/provinces, and countries. Dimension members act as indices for identifying a particular cell or range of cells within a multidimensional array. Each cell contains a value, also referred to as a measure, or measurement.
One issue regarding the design of multidimensional databases is how to represent the cells in the multidimensional space. One potential design choice is to represent the multidimensional space as an array of cells, with the size of the array determined by the multiplication of the number of points in each dimension. A significant problem with this approach is that the size of the database grows exponentially as the number and size of the dimensions increase. This leads to a rapid depletion of the physical resources such as persistent storage and RAM required to implement the database. This phenomenon is known as data explosion for multidimensional databases.
Additionally, space is wasted in the above-mentioned approach, as data in multidimensional databases tends to be sparse. That is, not every cell is expected to have a value associated with it. For example, consider a Store dimension having a hierarchy of Country, State, and City specifying the location of a store, and a Product dimension having a product identification and a product count measure. No store in the database will be expected to stock every possible product, and in fact any one store may only stock a small percentage of the available products. In this situation, most of the cells in the multidimensional space would contain no data, thus wasting much of the space allocated to the database.
A second issue relates to locating cells in the multidimensional space. It is desirable to be able to locate cells quickly in order to provide acceptable system throughput. Representing the cells as a multidimensional array provides for rapid access to the cells, but has the data explosion problem mentioned above.
A third issue relates to the capability to perform aggregations on the multidimensional data. Databases are commonly queried for aggregations (e.g. summaries, minimums, maximums, counts, etc.) of detail data rather than individual data items. For example, a user might want to know sales data for a given period of time without regard to geographical distinctions. These types of queries are efficiently answered through aggregations. Aggregations are precomputed summaries of selected detail data that allow an OLAP system or a relational database to respond quickly to queries by avoiding collecting and aggregating detailed data during query execution. Without aggregations, the system needs to scan all of the rows containing the detailed data to answer these queries, resulting in potentially substantial processing delays. With aggregations, the system computes and materializes aggregations ahead of time so that when the query is submitted to the system, the appropriate summary already exists and can be sent to the user much more quickly. Calculating these aggregations, however, can be costly, both in terms of processing time and in terms of disk space consumed.
A fourth issue relates to the stability of the members of a dimension hierarchy level. Stability refers to the propensity for members to move from one point in the dimension hierarchy to another. Some types of dimensions are very stable. For example, hierarchies in the time dimension are very stable, as there is no need to move a month from one year to the next.
Other hierarchies, however, tend to be much less stable, and members frequently move from point to point in the dimension hierarchy. As an example, consider a customer dimension having a hierarchy of Country, State, City, Customer, where a customer is located in a particular city of a particular state within a particular country. It is quite likely that at some point in time, a customer will move from one city to another, possibly in a different state, and perhaps to a different country.
In previous systems, the movement of a member from one point in a hierarchy to another point results in the entire OLAP database having to be rebuilt to reflect the new hierarchy. Completely rebuilding the database typically takes a large amount of time and system resources, especially for OLAP databases with large numbers of detail records.
The present invention is directed at addressing the above-mentioned shortcomings, disadvantages and problems, and will be understood by reading and studying the following specification.
The systems and methods described herein create and maintain cell data records in an OLAP database system. One aspect of the system is that cell data records are created containing a compressed system path. The system path is comprised of one or more compressed dimension paths that define the location of a cell in a multidimensional database. The dimension path may be a flexible or rigid dimension path, compressed or uncompressed. Flexible dimension paths map a unique member id to a rigid dimension path maintained outside of the system path. This allows a member to move from one location to another in the dimension hierarchy without changing the system path, thereby avoiding a rebuild of the OLAP database. Rigid dimension paths map directly to a member of a level in the dimension hierarchy. The format used for the dimension paths provides an efficient mechanism for locating the cell, and in addition, can be indexed easily to allow rapid location of cell data. Another aspect is that after the system paths have been created the system paths are loaded in segments and then compressed in binary format. The bits used to store each particular level member are constant, allowing random access of the data.
A further aspect of the system is that the format of the system path provides an efficient mechanism for creating aggregations. One aspect of the system converts flexible dimension paths that are in the dimension levels to be aggregated to rigid dimension paths before aggregrating. The rigid dimension paths have their corresponding member index set to a null value in the dimension path of each record. The records are then scanned for a match to a system path representing the aggregation. Those that match have their measure data included in the aggregation.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.