A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright(copyright) 1999, Microsoft, Inc.
The present invention pertains generally to computer-implemented databases, and more particularly to summaries of data contained in such databases.
Online analytical processing (OLAP) is a key part of most data warehouse and business analysis systems. OLAP services provide for fast analysis of multidimensional information. For this purpose, OLAP services provide for multidimensional access and navigation of data in an intuitive and natural way, providing a global view of data that can be drilled down into particular data of interest. Speed and response time are important attributes of OLAP services that allow users to browse and analyze data online in an efficient manner. Further, OLAP services typically provide analytical tools to rank, aggregate, and calculate lead and lag indicators for the data under analysis.
In this context, a dimension is a structural attribute of a cube that is a list of members of a similar type in the user""s perception of the data. For example, a time dimension can consist of days, weeks, months, and years, while a geography dimension can consist of cities, states/provinces, and countries. Dimensions act as indices for identifying values within a multi-dimensional array.
Databases are commonly queried for summaries of data rather than individual data items. For example, a user might want to know sales data for a given period of time without regard to geographical distinctions. These types of queries are efficiently answered through the use of data tools known as aggregations. Aggregations are precomputed summaries of selected data that allow an OLAP system or a relational database to respond quickly to queries by avoiding collecting and aggregating detailed data during query execution. Without aggregations, the system would need to use the detailed data to answer these queries, resulting in potentially substantial processing delays. With aggregations, the system computes and materializes aggregations ahead of time so that when the query is submitted to the system, the appropriate summary already exists and can be sent to the user much more quickly.
Calculating these aggregations, however, can be costly, both in terms of processing time and in terms of disk space consumed. Several conventional OLAP systems calculate all possible summaries of the data and suffer from substantial inefficiencies when working with large databases having many dimensions. Some other conventional OLAP systems allow the user to select specific pre-calculated aggregations, avoiding the delays associated with calculating all possible aggregations. Selecting an optimal set of aggregations for a given set of queries, however, is a complicated task that most end users would find difficult to perform at best. Still other OLAP systems do not create any aggregations at all. While this approach is workable for small data volumes, it is not efficient for use with large data volumes. Certain other OLAP systems implement algorithms for selecting aggregations, but fail to adequately consider the costs of creating and maintaining the aggregations.
Accordingly, a need continues to exist for a system that designs aggregation sets so as to make efficient use of computing resources.
According to various example implementations of the invention, there is provided an efficient system for selecting aggregations for use with a database, as described herein below. In particular, the invention provides, among other things, for the maintaining of benefit/cost ratings for possible aggregations. These ratings are used in determining which aggregations should be selected. Lists of candidate and selected aggregations are also maintained, and aggregations are moved between these lists based on their benefit/cost ratings. These ratings are adjusted as aggregations are moved between the lists.
According to one particular implementation, a set of aggregations is selected from a plurality of possible aggregations to answer a set of queries, by maintaining benefit/cost ratings for at least some of the possible aggregations. The benefit/cost rating for an aggregation is determined as a function of performance improvement attributable to the aggregation and of a computer resource cost associated with using the aggregation. A candidate set of candidate aggregations and a selected set of selected aggregations are also maintained. The benefit/cost rating for at least one of the possible aggregations is adjusted in response to moving a possible aggregation between the candidate and selected sets.
Another implementation is directed to a method involving determining benefit/cost ratings for at least some of the possible aggregations as a function of sizes of the aggregations. A candidate set of candidate aggregations and a selected set of selected aggregations are maintained and sorted in order of benefit/cost ratings. For each query in the set of queries, a best aggregation and a second best aggregation for answering the query are determined as a function of the sizes of the selected aggregations. At least one possible aggregation is moved between the candidate and selected sets. The benefit/cost rating for at least one of the possible aggregations is adjusted as a function of a size difference between at least one best aggregation and at least one second best aggregation.
Still other implementations include computer-readable media and apparatuses for performing these methods. The above summary of the present invention is not intended to describe every implementation of the present invention. The figures and the detailed description that follow more particularly exemplify these implementations.