On Line Analytical Processing (OLAP) provides the capability to do analysis, projections, and reporting on any number of dimensions and provide views of the data that otherwise are not easy to observe. This “slice and dice” and the “drill-up/down” of the data is critical to businesses and organizations for intelligent decision support.
Several forms of OLAP have been described: Multi-Dimensional OLAP (MOLAP), Relational OLAP (ROLAP), Desktop OLAP (DOLAP), and Integrated OLAP (IOLAP). The Multi-Dimensional OLAP uses a multi-dimensional schema to store a number of pre-compiled projections that can be accessed very fast and can perform very complex analysis on the data. The main disadvantage of Multi-Dimensional OLAP is that updates take the warehouse down for a long period, sometimes longer than the desired up time (sometimes longer than 24 hours!). Moreover, for sparse cubes (which is commonly the case) most of the cells in the multi-dimensional schema contain no data, and thus a significant amount of storage is wasted.
The Relational OLAP uses the Relational DBMS as the basis for OLAP and generates the data projections on the fly. Depending upon the size of the data warehouse, projection queries may take several hours or even days.
The Desktop OLAP uses tools for creating small “cubes” (projections) that reside on, and are displayed on, the desktop of personal computer (PC). This form of OLAP is targeted for specialized, small vertical projections of the data, which are created and updated as in Multi-Dimensional OLAP, but because of their tiny size, they can be updated efficiently.
The Integrated OLAP is a marketing scheme and is the combination of one or more of the above.
All forms of OLAP have the following three attributes in common: they offer decision support solutions; they permit the multi-dimensional viewing of data; and they allow a user to rapidly identify data (“drilling”), or to isolate subsets of data (“slicing” and “dicing”). The fundamental difference between OLAP forms concerns compromise trade-offs in space and precalculation (MOLAP & DOLAP) for speed of retrieval. This is even further applied in ROLAP systems, which add “summary tables” and special purpose indexes to increase performance. In other words, these approaches trade space and pre-calculated views and indexes for speed at the expense of update (down) time.
The data cube operator (J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. In Proc. of the 12th ICDE, pages 152–159, New Orleans, February 1996. IEEE) provided a formal framework for specifying the computation of one or more aggregate functions for all possible combinations of grouping attributes. The inherent difficulty with the cube operator is its size for both computation and storage. The number of all possible “group-bys” (subsets of data made in accordance with user provided instructions) increases exponentially with the number of the cube's dimensions (2d) and a naive store of the cube behaves in a similar way.
The problem of size is exacerbated by the fact that new applications include an increasing number of dimensions. Moreover, in many new applications, dimensions naturally form hierarchies, with the user being interested in querying the underlying data in any level of the hierarchy. In this case, the number of group-bys that need to be computed increases even more, and reaches 2H (where H is the total number of hierarchy levels) for the case of full data cubes. However, in such cases it is more often the case that only
      ∏          i      =      1        d    ⁢      (          1      +              levels        ⁡                  (                      dim            i                    )                      )  group-bys (where levels(dimi) is the number of hierarchy levels that the i-th dimension contains) are computed, and more specifically those group-bys that contain at most one hierarchy level from each dimension. In this case, the produced cube is called the “concatenated rollup cube”. Thus, new applications cause an explosive increase in the size of the cube, and present a major problem. All methods proposed in the literature try to deal with the space problem, either by pre-computing a subset of the possible group-bys (E. Baralis, S. Paraboschi, and E. Teniente. Materialized View Selection in a Multidimensional Database. In Proc. of VLDB Conf., pages 156–165, Athens, Greece, August 1997; H. Gupta. Selections of Views to Materialize in a Data Warehouse. In Proc. of ICDT Conf., pages 98–112, Delphi, January 1997; H. Gupta, V. Harinarayan, A. Rajaraman, and J. Ullman. Index Selection for OLAP. In Proc. of ICDE Conf., pages 208–219, Burmingham, UK, April 1997; A. Shukla, P. M. Deshpande, and J. F. Naughton. Materialized View Selection for Multidimensional Datasets. In Proc. of the 24th VLDB Conf., pages 488–499, New York City, N.Y., August 1998), by estimating the values of the group-bys using approximation techniques (J. Shanmugasundaram, U. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. In Proc. of the Intl. Conf. on Knowledge Discovery and Data Mining (KDD99), 1999; J. S Vitter, M. Wang, and B. Iyer. Data Cube Approximation and Histograms via Wavelets. In Proc. of the 7th Intl. Conf. Information and Knowledge Management (CIKM'98), 1998), or by pre-computing only those group-bys that have a minimum support (membership) of at least N tuples (K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In Proc. of the ACM SIGMOD Conf., pages 359–370, Philadelphia, Pa., USA, 1999). The larger the value for N, the smaller the number of group-bys that satisfy the minimum support condition and, therefore, the smaller the size of the resulting sub-cubes.