The present invention pertains generally to computer-implemented databases, and more particularly to compressing records in such databases.
Online analytical processing (OLAP) is an integral part of most data warehouse and business analysis systems. OLAP services provide for fast analysis of multidimensional information. For this purpose, OLAP services provide for multidimensional access and navigation of the data in an intuitive and natural way, providing a global view of data that can be xe2x80x9cdrilled downxe2x80x9d into particular data of interest. Speed and response time are important attributes of OLAP services that allow users to browse and analyze data online in an efficient manner. Further, OLAP services typically provide analytical tools to rank, aggregate, and calculate lead and lag indicators for the data under analysis.
In OLAP, information is viewed conceptually as cubes, consisting of dimensions, levels, and measures. In this context, a dimension is a structural attribute of a cube that is a list of members of a similar type in the user""s perception of the data. Typically, there are hierarchy levels associated with each dimension. For example, a time dimension may have hierarchical levels consisting of days, weeks, months, and years, while a geography dimension may have levels of cities, states/provinces, and countries. Dimension members act as indices for identifying a particular cell or range of cells within a multidimensional array. Each cell contains a value, also referred to as a measure, or measurement.
One issue regarding the design of multidimensional databases is how to store the cell information in the multidimensional space. One potential design choice is to represent the multidimensional space as an array of cells, with the size of the array determined by the multiplication of the number of points in each dimension. A significant problem with this approach is that the size of the database grows exponentially as the number and size of the dimensions increase. This leads to a rapid depletion of the physical resources such as persistent storage and RAM required to implement the database. This phenomenon is known as data explosion for multidimensional databases.
Additionally, space is wasted in the above-mentioned approach, as data in multidimensional databases tends to be sparse. That is, not every cell is expected to have a value or measure associated with it. For example, consider a Store dimension having a hierarchy of Country, State, and City specifying the location of a store, and a Product dimension having a product identification and a product count measure. No store in the database will be expected to stock every possible product, and in fact any one store may only stock a small percentage of the available products. In this situation, most of the cells in the multidimensional space would contain no data, thus wasting much of the space allocated to the database.
Another issue relates to the capability to perform aggregations on the multidimensional data. Databases are commonly queried for aggregations (e.g. summaries, minimums, maximums, counts, etc.) of detail data rather than individual data items. For example, a user might want to know sales data for a given period of time without regard to geographical distinctions. These types of queries are efficiently answered through aggregations. Aggregations are precomputed summaries of selected detail data that allow an OLAP system or a relational database to respond quickly to queries by avoiding collecting and aggregating detailed data during query execution. Without aggregations, the system needs to scan all of the rows containing the detailed data to answer these queries, resulting in potentially substantial processing delays. With aggregations, the system computes and materializes aggregations ahead of time so that when the query is submitted to the system, the appropriate summary already exists and can be sent to the user much more quickly. Calculating these aggregations, however, can be costly, both in terms of processing time and in terms of disk space consumed.
The present invention is directed at addressing the above-mentioned shortcomings, disadvantages and problems, and will be understood by reading and studying the following specification.
The systems, methods, and apparatus described herein create and maintain cell data records in an OLAP database system. The measure data fields located within cell data records are loaded from a data store and a determination is made as to whether to compress the measure data fields. One aspect of the system loads the measure data in segments, and all subsequent processing of the measure data is performed on a segment by segment basis. If the measure data are to be compressed, a size of a space to store the measure data in a compressed format is determined. The size determination may be made utilizing many different operations. The size may be determined based on the range of values contained within the measure data fields. Additionally, the size may be determined based on the minimum/maximum values of the measure data. Another aspect of the system determines if the values within the measure data field are constant. In other words, if the measures within the measure data field are all the same then the values are constant. If the values are constant then the value of the measure data is stored within a header and the measure data are not compressed or stored on a data store. If the values are not constant, the determined size of a space is stored in a header that can be accessed at a later time. The measure data is then compressed and stored in a data store in binary format. Storing the measure data in a fixed field size allows the data to be randomly accessed. Additionally, the compression operation provides an efficient mechanism for creating aggregations.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.