A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright (copyright) 1999, Microsoft, Inc.
The present invention pertains generally to computer-implemented databases, and more particularly to an adaptive caching and data retrieval mechanism for multi-dimensional databases.
Online analytical processing (OLAP) is a key part of most data warehouse and business analysis systems. OLAP services provide for fast analysis of multidimensional information. For this purpose, OLAP services provide for multidimensional access and navigation of data in an intuitive and natural way, providing a global view of data that can be drilled down into particular data of interest. Speed and response time are important attributes of OLAP services that allow users to browse and analyze data online in an efficient manner. Furthermore, OLAP services typically provide analytical tools to rank, aggregate, and calculate lead and lag indicators for the data under analysis.
An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located and accessed based on the intersection of the dimension members which define that item. The design of the server and the structure of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
A fundamental entity that is present in typical OLAP databases is a cube. A cube is a multidimensional representation of a set of data having varying aspects. A cube comprises a set of dimensions and a set of measures. In this context, a dimension is a structural attribute of the cube that is a list of members of a similar type in the user""s perception of the data. Typically, there is a hierarchy associated with the dimension. For example, a time dimension can consist of days, weeks, months, and years, while a geography dimension can consist of cities, states/provinces, and countries. Dimension members act as indices for identifying a particular cell or range of cells within a multidimensional array. A measure is a structural attribute of the cube that comprises a particular type of value that provides detail data for particular members within the dimensions. For example, sale amounts and units sold can be measures of a retail cube having a time dimension and a geography dimension. The measures provide the sale amounts and units sold for a particular geographic region at a particular point in time. For example, consider the following query:
select
time.members on columns,
geography.members on rows,
from sales.
In this query, time and geography are dimensions of the underlying database while sales in a measure. Conventional OLAP databases return a data set that does not include any calculated members that are within the range of the data set.
A calculated member is a member of a dimension whose value is determined from other members"" values (e.g., by application of a mathematical or logical operation). Calculated members may be part of the OLAP server database or may have been specified by the user during an interactive session. This allows the user to customize the dimension tree by combining cube data, arithmetic operations, numbers and/or functions.
OLAP services are conventionally provided using a client-server model. An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. An OLAP client interfaces with the OLAP server, thereby providing OLAP services to external application programs. For example, an OLAP client may provide OLAP services to a variety of external application such as a data mining application, a data warehousing application, a data analysis application, a reporting application etc.
It is well known that retrieving data from a database server is costly with regards to memory consumption, input/output and network latency. Conventional client-server database systems have caching mechanisms that seek to minimize these costly server requests. There is a need in the art, however, for an adaptive caching mechanism that aggressively retrieves data in order to increase the probability that cached data will satisfy future database requests, thereby reducing the number of roundtrips to the OLAP server, yet weighs this benefit with the costs of high-volume database queries.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification. Systems, clients, servers, methods, and computer-readable media of varying scope are described in which.
The invention is directed toward a data retrieval software module that implements an aggressive caching mechanism that retrieves additional data with the expectation of satisfying future queries. The amount of additional data retrieved is based upon a cost benefit analysis that considers the costs of further retrieval versus the benefits of having the data ready for such future requests. The caching mechanism uses one or more probability schemes to predict which groups of data cells that will satisfy future queries and applies cost benefit policies to determine what granularity of data cells to retrieve.
More specifically, according to the invention upon receiving a query a database client determines a set of member sub-hierarchies for each dimension specified by the query that are suitable for satisfying the query. The client analyzes the costs and benefits of retrieving each member sub-hierarchies from the multidimensional database of the database server and selects one of the member sub-hierarchies for each dimension as a function of the analysis. Finally, the database client retrieves the selected member sub-hierarchies from the multidimensional database.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.