Database systems are often designed to maintain a relatively large amount of information about a variety of entities, events, or occurrences (referred to generally as occurrences), and these occurrences may be described by a variety of characteristics. Even database systems that do not yet contain large amounts of information are often designed to be scalable such that the database systems can be adapted to accommodate ever-increasing amounts of data. Some tables are so large, due to the fact that they include every occurrence and every characteristic of every occurrence, that they may be impossible to analyze if there are not enough resources to store and process significant portions of these tables. Even if sufficient resources are available, storing and processing significant portions of these large tables can be quite costly. As a result, when occurrences have many characteristics or are otherwise related to a variety of information, many database systems separate such information about the occurrences into multiple tables.
Database systems often group tables based on categories of characteristics. Much of the information may be descriptive information about entities, categories, or classes of information (referred to generally as categories) involved in the occurrences. The description of these underlying categories may change infrequently compared to the other tables that record or measure the occurrences themselves. Dimension tables are tables that contain descriptive information about occurrences that are referenced by or may be referenced by other table(s). The other table(s) include column(s) that reference row(s) of the dimension table(s), and each referencing column identifies what is referred to as a dimension of column(s) that occur in dimension table(s). Data that is organized into two or more dimensions is referred to herein as a multidimensional dataset.
Fact tables are the other tables that measure the occurrences related to the categories. In other words, fact tables store facts or measurable quantitative data, and this measurable data may be involved with or otherwise fall under the categories. By referencing the dimension tables, the fact tables do not need to duplicate all of the information contained in the dimension tables. Generally, because fact tables may include multiple occurrence(s) that reference the same category, fact tables are usually larger than dimension tables. Also, because fact tables measure the occurrences rather than recording the definitions, the fact tables are usually updated more frequently than dimension tables. An organization of multidimensional data into fact table(s) and dimension table(s) is referred to as a star schema.
Queries that operate on data stored in tables that belong to a star schema are referred to as star queries. Star queries often request information from a fact table with filters that are based on characteristics listed in the dimension tables. For example, a star query may request all sales records that involved customers between the ages of 25 and 30. Although the fact table may include all sales records and identify the customers involved in those sales, the fact table likely does not list the respective ages of those customers. Therefore, evaluation of the star query requires a determination of which fact table records identify customers that fall within the requested ages. Such a determination may consume significant amounts of resources for large fact tables and multiple filters.
Some analytical applications initially present an aggregated view of multidimensional data at a particular level. In order to generate the view, an underlying OLAP system typically scans one or more fact tables, which may comprise several million records, to return few rows of data. A request to drill down to a different aggregated view of the multidimensional dataset may trigger a subsequent scan of one or more relatively large tables, which may be computationally expensive. One approach to preserve compute-resources during run-time is to pre-compute aggregated views. The number of possible aggregations, however, may be prohibitively large to continuously calculate and update due to the large possible combinations of dimension granularities. Therefore, this approach may not be feasible where the multidimensional datasets include large amounts of data.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.