1.0 Field of the Invention
This invention relates generally to online analytical processing (OLAP) systems and more particularly to designing aggregates based on access patterns in the dimensions of a dimensional model in an OLAP system.
2.0 Description of the Related Art
Business analysts analyze data to determine the state of their business. Business data typically comprises sales, product and financial data over various time periods. Typically, new business data is received for various time periods. A time period may be a day, week, month, quarter or year.
Dimensions are collections of related attributes of the data values of the business, for example, product, market, time, channel, scenario and customer. To understand their businesses, business analysts frequently work with data which is aggregated across dimensions. To provide this information, low level data, at the transaction level or lower, is aggregated across various business dimensions. This provides analysts with the ability to explore business information in context, for example, sales by product by customer by time, or defects by product by manufacturing plant by time.
In an OLAP system, dimensional models allow business analysts to interactively explore information across multiple viewpoints at multiple levels of aggregation. A dimensional model typically has many dimensions. The business data is typically aggregated across the various dimensions to provide different views of the data at different levels of aggregation. The data may be aggregated over various periods of time, by geography, by teams and by product, depending on the type and organization of the business. Aggregated data is commonly referred to as an aggregate. For example, an aggregate may be the sales data for the month of July for a specified product.
In some OLAP systems, the business analyst queries the data for a desired set of aggregates, and the OLAP system computes the aggregates. However, computing the aggregates takes time, and the system response may be slow. To improve performance, aggregates are typically pre-computed and stored. Typically, not all possible aggregates for the dimensional model can be built and stored because the amount of storage space is limited in a computer system, and also because the amount of time to build the aggregates is limited.
OLAP technologies have been developed to provide the aggregates that business analysts use to carry out analysis across multiple points of view or dimensions. These technologies face two competing issues: (a) pre-computing as many aggregates as possible so that analysts can carry out highly interactive analysis and navigation without waiting for the aggregates to be computed; and, (b) limiting the amount of time and space used to pre-compute and store the aggregates because typically, systems do not have sufficient storage space for all pre-computed aggregates for all dimensions, and do not have unlimited amounts of time to build all the aggregates for all dimensions.
Various approaches have been taken to address these competing issues, (a) and (b) above, to provide business analysts with sufficient aggregates to facilitate broad and deep information exploration and analysis. One approach limits the scope of the dimensional model to those aggregates that can be reasonably generated. Another approach does not limit the scope of the model, and identifies a subset of the entire set of possible aggregates to be pre-computed.
A model-based approach to identifying a subset of aggregates to pre-compute is based on analysis of the characteristics of the dimensional model and the data to determine which aggregates have the most utility and will contribute most to performance improvements. However, the model-based approach may not identify some aggregates that are frequently accessed or may identify aggregates that are infrequently accessed.
A query-based approach to generating pre-computed aggregates is based on the analysis of actual queries submitted by users. The query-based approach will discover aggregates which are most relevant to the set of queries being analyzed. However, the query-based approach is limited in scope to identifying aggregates in accordance with previous queries. When the queries access different aggregates from those in the previous queries, the pre-computed aggregates may become less useful, and the system becomes less responsive because new aggregates need to be computed.
Therefore, there is a need for an improved technique to identify aggregates which are frequently accessed and which also provide increased utility to improve business analysis.