This invention relates in general to database management systems performed by computers, and in particular, to database management systems that support on-line analytical processing (OLAP).
Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
RDBMS software has typically been used with databases having traditional data types that are easily structured into tables. However, RDBMS products do have limitations with respect to providing users with specific views of data. Thus, xe2x80x9cfront-endsxe2x80x9d have been developed for RDBMS products so that data retrieved from the RDBMS can be aggregated, summarized, consolidated, summed, viewed, and analyzed. However, even these xe2x80x9cfront-endsxe2x80x9d do not easily provide the ability to consolidate, view, and analyze data in the manner of xe2x80x9cmulti-dimensional data analysis.xe2x80x9d This type of functionality is also known as on-line analytical processing (OLAP).
A thorough state of the art description of OLAP technology is provided by Erick Thompson, xe2x80x9cOLAP Solutionsxe2x80x94Building Multidimensional Information Systemsxe2x80x9d, Wiley Computer Publishing, 1997.
OLAP generally provides numerous, speculative xe2x80x9cwhat-ifxe2x80x9d and/or xe2x80x9cwhyxe2x80x9d data model scenarios executed by a computer. Within these scenarios, the values of key variables or parameters are changed, often repeatedly, to reflect potential variances in measured data. Additional data is then synthesized through animation of the data model. This often includes the consolidation of projected and actual data according to more than one consolidation path or dimension.
Data consolidation or aggregation is the process of synthesizing data into essential knowledge. The highest level in a data consolidation path is referred to as that data""s dimension. A given data dimension represents a specific perspective of the data included in its associated consolidation path. There are typically a number of different dimensions from which a given pool of data can be analyzed. This plural perspective, or Multi-Dimensional Conceptual View, appears to be the way most business persons naturally view their enterprise. Each of these perspectives is considered to be a complementary data dimension. Simultaneous analysis of multiple data dimensions is referred to as multi-dimensional data analysis.
OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated data supporting end user analytical and navigational activities including:
calculations and modeling applied across dimensions, through hierarchies and/or across members;
trend analysis over sequential time periods;
slicing subsets for on-screen viewing;
drill-down to deeper levels of consolidation;
reach-through to underlying detail data; and
rotation to new dimensional comparisons in the viewing area.
A key feature of any multidimensional OLAP system is the ability to define a dataset in terms of multiple dimensions. Loosely speaking, dimensions may be thought of as major perspectives, entities, factors, or components. So a multidimensional dataset may be thought of as multi-key or multi-attribute data set. The benefit of handling multiple dimensions is in being able to represent the full richness of a dataset in a single model or cube.
A further key feature of any multidimensional OLAP system is the hierarchical nature of the dimensions. Any dimension such as xe2x80x9ctimexe2x80x9d, xe2x80x9cproductsxe2x80x9d, and xe2x80x9cstoresxe2x80x9d can have multiple levels of granularity. For example, the xe2x80x9ctimexe2x80x9d dimension may have a day-level granularity, a week-level granularity, a month-level granularity , a quarter-level granularity, and so on. This is indispensable for working with larger data sets that invariably need to be aggregated, analyzed, and viewed across multiple levels of aggregation.
A further key feature of any dimensional OLAP system is the ability to attach formulas to members of dimensions. Because a single member of one dimension, say the xe2x80x9cSalesxe2x80x9d member from a variables dimension, interacts with every single member from every other dimension, a single-dimensional formula has a powerful application range frequently doing the same work that would otherwise take thousands of spreadsheet formulas. Also more multi-dimensional formulas work equally well in all dimensions. Multi dimensional formulas dramatically simplify the process of defining aggregation and analysis calculations.
The dimensions are used to build up the hierarchy information which is the basis for the later analysis of the data. With the hierarchy information the user has the capability to see aggregated business information on higher levels. It is also possible to drill down from some higher level information to the lower level once to examine individual aspects of differences within the data.
Typically the multi-dimensional data is organized and represented in a form of a (hyper-)cube, wherein the various dimensions of the data are reflected as cube dimensions.
As the available amount of raw data and the number of available variable types, which potentially could be used as dimensions for multidimensional database tables, is so large, one of the most fundamental problems within OLAP technology is to select the xe2x80x9ccorrectxe2x80x9d variable types to be used as OLAP cube dimensions. As a rule of thumb, a multi-dimensional database table should not exceed about eight dimensions. Choosing an inappropriate dimension therefore reduces the freedom of selecting other variable types as dimensions of a multidimensional database table. During the setup phase, the structure of the multidimensional database tables are defined implementing the internal representations of the OLAP cube. Identifying the xe2x80x9ccorrectxe2x80x9d variable types to be used as dimensions of the multidimensional database tables has a dramatic influence on the success of OLAP technology.
First of all, a selection of certain variable types provides a significant reduction of the large amount of business data which is to be analyzed and used to populate the multidimensional database tables.
Second, the nature and the number of the selected variable types define the size of OLAP cubes in terms of the amount of required storage.
The following example suggests the large amount of storage required to build the OLAP multidimensional data structures to store the input data and the derived, computed data. The size of the multidimensional database table is a function of the number of dimensions (defined by the variable types) and the number of elements in each dimension. The maximum possible size of such a multidimensional database table is obtained by multiplying together the number of elements in each dimension. For example, when
Dimension 1 has a 50 members
Dimension 2 has 100 members,
Dimension 3 has 12 members,
Dimension 4 has 1000 members, and
Dimension 5 has 12000 members,
The total number of potential members is 50*100*12*1000*12000 or 720,000,000,000. If just one more dimension, that is a further variable type, with only 10 members were added to this multidimensional database table, the above maximum potential size of the multidimensional database table would be multiplied by a factor of 10. This phenomenon is known as database explosion.
As a result, OLAP databases are very large and require significant resources to load and to calculate. To avoid this OLAP database explosion, designers must constrain the number of dimensions in each multidimensional database table. The mean storage requirements to store an OLAP cube with about 8 dimensions is in the range of 5 to 15 gigabytes of data. Thus, any xe2x80x9cwrongxe2x80x9d dimension introduced to the multidimensional database table could result in an intractable model. Vice versa, xe2x80x9cwrongxe2x80x9d dimensions and too many dimensions increase processing time significantly, which is a major drawback within OLAP technology wherein many analysis steps are performed interactively with a user. Moreover, the quality of the analysis provided by OLAP technology is significantly improved if the xe2x80x9cCorrectxe2x80x9d variable types are selected; avoiding variable types within the multidimensional database tables with minor importance to the measure data also leads to improved analysis quality.
Consequently, an object of the invention is to provide a technology for automatically determining the variable dimensions of multi-dimensional database tables forming a representation of OLAP cubes, and providing a starting point for OLAP processing.
The present invention relates to means and a computerized method for automatically determining one or more variable dimensions for a multi-dimensional database table, whereby the multi-dimensional database table also comprises at least one pre-defined measure dimension for storing values of one or more measure types.
For the purpose of determining variable dimensions, a multitude of input records are treated as implicit, yet unknown functional relationships between the measure types (as dependent variables) and the variable types (as independent variables). A regression function is calculated and used as a prediction model for the (dependent) measure types based on the (independent) variable types utilizing the multitude of input records. The most significant variable types contributing to this prediction model are then selected as variable dimensions for the multidimensional database table, i.e. as the OLAP cube dimensions.
The present invention solves a fundamental problem within OLAP technology, namely the problem of selecting the xe2x80x9ccorrectxe2x80x9d variable types to be used as variable dimensions of a multidimensional database table also defining the so-called OLAP cube dimensions. At present, human experts are required to perform this selection process based on extensive experience and an intuitive feel for the data to be treated by OLAP technology. The current invention proposes an automatic and computerized method for solving this problem.
Selection of the xe2x80x9ccorrectxe2x80x9d variable types provides a significant reduction of the large amount of business data which is to be analyzed and used to populate the multidimensional database tables. Selection of xe2x80x9cwrongxe2x80x9d variable dimensions is avoided. Both aspects ensure that the extremely complex multidimensional database tables remain tractable in terms of required storage and processing time. Moreover, the quality of the OLAP analysis is significantly improved by selecting xe2x80x9ccorrectxe2x80x9d variable types; conversely, avoiding variable types within the multidimensional database tables with minor importance to the measure data is also an important aspect of improved analysis quality.