1. Field of the Invention
The present invention is directed to database management systems and in particular to the ways in which the execution of database queries can be optimized.
2. Background Information
A. Data Warehouses
Conventional database systems were developed in the 1970's to deal with applications in which workloads were dominated by writes. Consider a bank in which a teller initiates high-priority database transactions. Each transaction's goal is to reflect customer withdrawals and deposits quickly and accurately in the database. Applications of this nature are sometimes characterized by the term On-Line Transaction Processing (OLTP). OLTP is concerned more with making writes fast than with expediting reads. These applications typically need access to the most-current data. Row stores are a response to this need.
In contrast, there is another class of applications in which the workload is dominated by read operations. These applications tend to occur in decision-support settings. Consider a buyer for a large retail chain. Such a user is concerned with looking for patterns and trends in historical data in order to forecast what items might sell well in the future. Applications of this type are sometimes characterized by the term On-Line Analytical Processing (OLAP). OLAP is concerned more with read speed than with write speed. Data warehouses, in general, are a response to this need. Column stores (described in a later section) are a special class of hyper-fast data-warehouse system.
In many cases, OLAP does not need the most-current data. A typical deployment may load the current day's data at night, making the database for OLAP as much as twenty-four hours out of date. Since the applications are more concerned with historical trends, this is acceptable in a lot of cases.
Data-warehouse systems have been developed to deal with OLAP applications. Data warehouses are typically very big. It is common for a data warehouse to involve 10 to 100 terabytes of data. In order to make managing these massive data sets more tractable, data warehouse systems have taken advantage of the typical data relationships that are found in these applications. The next few sections describe these relationships.
Table Roles in Data Warehouses
Typically there is one very big table that accumulates the history of the important events in the application. New events are frequently appended to this table, which is commonly called the fact table. As an example, consider a data warehouse for a large retail chain. The fact table may contain one record for each item that is bought at one of the many stores. In other words, each scanning operation at a point-of-sale terminal would generate one record in the fact table.
Each purchase can be described along many dimensions. For example, a given purchase can be described in terms of the store at which it was bought. If queries seek, say, all purchases that were made in New York, information about each transaction's location has to be stored. However, rather than store this and other such descriptive information with each record in the fact table—and thereby make the fact table huge—such information is typically factored out into separate tables called dimension tables. The primary key of each dimension table—i.e., the attribute (or, occasionally, set thereof) that the database management system prevents from having the same value in more than one of that dimension table's records—is embedded in the fact table. We call the embedded primary key a foreign key in the context of the fact table. Thus, answering questions about purchases in a given state would involve a join of the fact table with the dimension table called Store on the corresponding primary-key/foreign-key attributes.
Schemas that have this primary-key/foreign-key relationship between one fact table and many dimension tables are referred to as star schemas since such a schema can be represented graphically by a central fact-table node connected to surrounding dimension-table nodes by edges that represent a primary-key/foreign-key joins. In a star schema, the dimension tables are not connected to each other directly. The embedded key relationship between the fact table and a dimension table is necessarily an n-to-one relationship. So the join of the fact table with a dimension table has only as many rows as there are rows in the fact table.
Sometimes, a dimension table can be further qualified by additional dimension tables. A good example of this is a dimension table called “State” that has fifty rows and that has information like the state's population, and another dimension table, called “County,” that includes one row for each county for each state. The County table would include the State table's primary key as one of its (foreign-key) attributes. The County table may contain further descriptors, like mean family income. This would be handled in much the same way as the fact-to-dimension relationships described above, i.e., by embedding foreign keys in the obvious way. This generalization of a star schema is called a snowflake schema because of the clear geometric analogy.
Another way to view this employs a graph representation of the schema. We can think of the foreign-key-to-primary-key relationship expressed in a schema S as describing a directed graph G=(V, E) in which the vertices V are S's relations and the edges E are of the form (Ri, Rj), where Ri contains a foreign key and Rj contains the corresponding primary key. Ri is the source vertex, and Rj is the sink vertex. A schema that can be depicted as a graph in which one vertex F has no incoming edge and in which each other node has only a single incoming edge (i.e., tree graph) is called a snowflake schema. If the longest path in such a schema graph is of length one, then we call the schema a star schema. This kind of graph-based schema representation will come up again later.
B. Column Store Database Architectures
Although the applicability of the invention to be described below is not limited to any particular physical database layout, some of its advantages will be particularly manifest when it is applied to databases implemented as “column stores.” A column store differs from the more traditional row-store architecture in that data in a column store are stored on disk clustered by column rather than by row.
FIG. 1 illustrates this distinction. A logical database design usually comprises one or more conceptual tables such as FIG. 1's table 20. A table is characterized by a set of attributes, usually depicted as column headings, for which each record in the table has respective (albeit potentially empty) values. In table 20, for instance, the attributes are Emp# (employee number), Dept (department), Room, Phone, and Age. In conventional, horizontal storage arrangements, values of a given record's successive attributes are typically stored physically adjacent to one another, as FIG. 1's depiction of row store 22 suggests. Although some databases will partition a table horizontally into multiple files, the drawing depicts the illustrated logical table's contents as contained in a single file, in which the values are stored in the following order: row 1 (Emp#), row 1 (Dept), row 1 (Room), row 1 (Phone), row 1 (Age), row 2 (Emp#), etc. In other words, the table is stored in a manner that reflects visiting the table's contents by reading each row from left to right and then moving to the next row. The order of rows can be arbitrary, or it can be based on a given set of attributes, in which case the table is effectively sorted before it is stored onto disk.
Horizontal data storage has been the traditional physical-storage approach, in part because it makes initial storage of an individual record relatively fast; to store a given record usually requires access only to, say, a single disk block. But there are a great many databases in which reading occurs much more frequently than writing. For such databases, it is often better for the physical layout to be vertical, as FIG. 1's column store 24 depicts: successive storage locations are occupied by different records' values of the same attribute. Unlike row store 22, which would typically store the entire table in a single file, a column store 24 is usually implemented by storing each column in one or more separate files. Thus, there is a file for the Emp# column, with values appearing in the order first row, second row, third row, etc., a file for the Dept column, with values also appearing in the order first row, second row, third row, etc., and so on.
One reason why a vertical storage arrangement is preferable for data reading is that fetching the results of a query requires access to only enough, say, disk blocks to contain the values of the attributes of interest; there is no need to access enough disk blocks to contain all of the attributes of all of the records that meet the query criteria.
C. Sort Orders
A key physical-database-design choice for both row stores and column stores is the order in which rows or columns are stored. For example, FIG. 1's table 20 may be ordered in a row store on attribute Room, in which case the first row in the table (Room=101) would be followed by the third row (Room=105) followed by the second row (Room=203). The choice of sort order strongly influences both query and update performance. For example, if table 20 is physically sorted on Room, then queries that include predicates on Room can be evaluated without scanning the entire table, either with a binary search, or more commonly, by using a sparse index that typically is placed over the sort attribute. A common choice of sort order for a table in a row store is that of its primary key, since this facilitates insertions and updates that must be accompanied by checks for violations of key constraints.
D. Query Optimization
A query is a request for data. All DBMSs translate queries into algorithmic access plans that get compiled or interpreted into code that returns the data specified by the query. The process of translating a query into an access plan is called query optimization, and the most common paradigm for query optimization is cost-based query optimization.
Logical-Plan Generation
A query optimizer first transforms a (typically, SQL) query into a logical plan. A logical plan is a tree of logical operations, where a logical operation is a specification of a class of physical operations that all have the same semantics. Each of the logical operations can be realized with potentially different algorithms.
Often, a query optimizer will precede its cost-based step with a query-rewriting step. Query rewriting consists of applying equivalence-preserving transformations to the query expression to arrive at a new expression that is likely to be cheaper to evaluate. The classic example of such a rewrite is pushing predicates below joins in the query expression tree. Applying the predicate first reduces the join inputs' sizes.
An example language for logical plans is the relational algebra, which includes logical operators: join, project, select etc. These operators are “logical” because each is agnostic about the algorithm used to implement it. (e.g., the logical operation join can be implemented with, say, a hash join or sort-merge join).
Another common query-rewriting strategy is to transform a nested query into an equivalent join query, as is usually possible. The importance of this transformation is that it prepares the query for the next step by putting it into a form for which a wealth of further optimizations exist. Join processing has been widely studied, so most optimizers are capable of doing it well.
A Generic Logical-Plan Language
The invention to be described below can be applied to any query optimizer that generates a logical plan distinct from the physical plan. Languages for expressing logical plans are varied, but all share some manifestation of the following key operations, which we will refer to in explaining an embodiment of the invention:                a. Join(p) represents a class of physical plan operators that pair records from separate tables that together satisfy the correlating predicate, p. Example plan operators in this class include hash join, sort-merge join and indexed nested-loop join. Typically, join is expressed as a binary operator and joins of n tables are expressed with n−1 binary joins.        b. Select(p) represents a class of physical plan operators that retrieve (from, e.g., disk) table records that satisfy the predicate p. Example plan operators in this class include index scan and file scan.        c. Aggregate (GrpAtts, AggAtt, AggFun) represents a class of physical plan operators that group records from a relation and then apply some aggregate function to produce one record from each group. The parameter GrpAtts is the set of grouping attributes from the input relation such that records with the same values for all of these attributes are in the same group. The parameter AggAtt is the attribute over which aggregation takes place. AggFun names the aggregate operation (e.g., MAX, MIN, SUM, AVG or COUNT) applied to each group. Example operators in this class include hash aggregation and single-pass aggregation.        d. Sort (AttList, OrderList) represents a class of physical plan operators that sort the records of a relation. The parameter AttList is a list of n attributes, Att1, . . . , Attn from the relation on which sorting takes place. (Att1 is the primary sort order, Att2 is the secondary sort order, etc.) OrderList is a list of n qualifiers that indicate whether the sort over the corresponding attribute is in ascending (ASC) or descending (DESC) order.        
A typical logical plan produced by any query optimizer is an operator tree that, as the FIG. 2 example shows, has the following characteristics:                1) Select (σ) operators at the bottom: A common query-plan-evaluation strategy is to “push selections” as close to the plans' leaves as possible so as to minimize the amount of data that must be processed by the rest of the plan.        2) Join () operators in the middle: Once the input relations to joins are restricted with select operators, the n−1 binary joins required to join n tables are performed in some order on the restricted relations.        3) Aggregate (G) and Sort (S) operators at the top: Once the join result for the query has been produced, the result can be grouped and aggregated and, if necessary, sorted.Physical-Plan Generation        
A logical plan is then transformed into a physical plan by mapping each logical operator to physical plan operator that is used to implement it. The process of logical-to-physical-plan mapping is typically based on cost-based decisions and is briefly described next.
Given a query q, cost-based query optimization follows the following three steps to produce an access plan for q:                1. The optimizer applies some heuristics to generate a set of candidate access plans, each of which could be used to evaluate q.        2. A “cost model” is used to estimate the cost of each candidate access plan. A cost model is a set of formulas that rely on statistical information about data contained in the database to predict the cost of an access plan operation. Typical statistics that determine cost include the cardinality of a relation, the range and distribution of values for a given attribute, and the size of indexes over attributes on a table. Costs are sometimes measured in estimated time to evaluate the query, or in a more coarse-grained fashion (such as the estimated number of disk I/O's required to evaluate the query).        3. Given the set of candidate access plans and their estimated costs, the query optimizer chooses the least expensive among them to evaluate.        
For row-store systems, query optimization typically focuses on two concerns that are reflected in the candidate access plans considered for a given query:                1. Determining the indexes to use to evaluate the query, and        2. Determining the order in which to evaluate the joins, and the join algorithm (hash-join, sort-merge, index nested loops etc.) for each.Join Processing        
For queries with joins of at most two tables, index selection effectively determines the candidate access plan, and for queries with more-complex joins, index selection determines the sub-plans that generate inputs to the joins in candidate access plans.
Now, data-warehouse products are capable of creating and using materialized views. Materialized views are typically used to store materialized aggregates or rollups on popular dimensions. Although some materialized views contain computed entries, such as averages, some contain only entries that can be found in the database's logical tables. We refer to materialized views of the latter type as “projections.” For a column store that stores overlapping projections in varying sort orders, the primary concern of query optimization (analogous to concern #1 for row stores) is choosing the projections used to evaluate the query. Therefore, the sort orders in which columns are stored heavily influence the cost of evaluating the query.
For a query with more than two joins, the optimizer must determine the best order for evaluating them. Join processing in relational databases proceeds by examining possible join orders and comparing their expected costs relative to a cost model. We call this step join enumeration. A join of n tables is processed by applying a binary join operator n−1 times. Thus, a join-evaluation order can be represented as a binary tree in which each interior node is a join operator, and each leaf node is a base relation. The left argument to the join is called the outer relation, and the right argument is called the inner relation to reflect the role that they play in the natural looping structure of join algorithms.
The most expensive phase of query optimization is the join-enumeration phase. Because join operators are binary, there are O (n!*Cn) different join orders that could be considered, given n tables to be joined. Here, Cn refers to the nth Catalan number, which is equivalent to 2n!/((n+1)!*n!) for n>=0. Note that O (n!*Cn) is an exceptionally fast-growing function, as evidenced by the following table of values of both factors for selected values of n:
nn!Cn1115241410362880486215871782912002674440201216451004088320001767263190
It is impractical to assess that many plans for all but trivial values of n, so a query optimizer must somehow prune the space of join orderings that are considered. A good query optimizer is therefore one that can so prune this search space as to ensure that the remaining plan set includes good orderings. Given a cost model that is reasonably accurate, a query optimizer can exhaustively search that plan set to find good plans.
There is a fundamental tradeoff between the time spent in producing an optimized query plan and the quality of the result. For complex queries, the search space that could be explored by the optimizer is potentially huge. Exhaustively enumerating this entire space may take so long as to be prohibitive. Therefore, it is common for optimizers to limit the search space in some way. In fact, the IBM DB2 optimizer (“IBM Universal Database for Linux, Unix, and Windows,” Product Manuals, http://www-306.ibm.com/software/data/db2/udb/support/manualsv7.html) currently gives the user a choice over ten different optimization levels. Choosing a lower level restricts the amount of effort that the optimizer will expend. By choosing an optimizer level the user can decide how thoroughly the space of possible plans should be explored. This choice may be based on the perceived expense of the query or on how many times the query will be executed.
The state of the art in space-pruning for join enumeration can be summarized as follows:                1. Left-Deep Trees Only: A left-deep join tree (illustrated in FIG. 3) is one whose right branches always consist of base relations (i.e., named relations stored on disk). Therefore, restricting to just left-deep trees the search space of join orderings considered is a reasonable way to prune the join search space; a left-deep join plan ensures that intermediate join results do not have to be materialized (i.e., written to disk). Instead, they can be pipelined directly into the next join operator.        Even within the space of left-deep plans, the number of possible join orderings is large. Suppose that a given query involves a join of three tables A, B, and C. This query could be evaluated as any of the following left-deep trees: (A join B) join C, (B join A) join C, (A join C) join B, (C join A) join B, (B join C) join A, and (C join B) join A. The complexity of join enumeration is reduced from O(Cn) to O(n!) by restricting plans under consideration to those that are left-deep. While such a restriction reduces complexity significantly, it still leaves a very large number of join orderings to consider.        2. Dynamic Programming: Dynamic programming is a standard algorithm-design technique invented by Richard Bellman in 1953 for solving certain problems that exhibit overlapping subproblems (i.e., subproblems that can be reused several times in solving the problem as a whole).        The effect of dynamic programming is to reduce the size of the examined search space by ignoring plans that, by virtue of their similarity to already considered plans, are known not to be optimal. As an example, consider the following left-deep join plans that could be considered in processing a join of four relations A, B, C and D:                    (1) ((A JOIN B) JOIN C) JOIN D            (2) ((B JOIN A) JOIN C) JOIN D            (3) ((A JOIN B) JOIN D) JOIN C            (4) ((B JOIN A) JOIN D) JOIN C                        Suppose than an optimizer evaluates plans (1) and (2) and uses the cost model to determine that plan (1) is less expensive than plan (2). In this event, it must be the case that (A JOIN B) is less expensive to evaluate than (B JOIN A), because this is the only difference between the two plans. Given this inference, there is no need to evaluate plan (4), because plan (3) must be less expensive to evaluate. Thus, the effect of dynamic programming is to dynamically prune the search space to remove plans from consideration for which non-optimality can be inferred from intermediate results of the search.        Applying dynamic-programming techniques to left-deep join enumeration reduces search complexity from O(n!) to O(2n) (i.e., to exponential complexity). IBM's System R Optimizer (P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price, “Access Path Selection in a Relational Database management System,” in Proceedings of the A CM SIGMOD Conference on the Management of Data, Boston, Mass., May, 1979) was the first to use dynamic programming in the context of left-deep join trees. Dynamic programming is used here as an efficient way to enumerate and explore query costs. Again, this reduces complexity significantly, but it still leaves a large number of join orderings to consider, as the following table listing 2n for selected values of n demonstrates:        
n2n115321010241532768201048576
Common wisdom is that, with dynamic programming and the left-deep restriction on the search space, a query optimizer can exhaustively examine join orderings for queries consisting of twelve to fifteen tables within a reasonable time frame. For any query that targets more tables than this, the optimizer will need to time out before the search space has been examined exhaustively and to select the best plan seen before it timed out. Thus, the greater the number of tables in the query beyond, say, fifteen, the closer the query optimizer comes to being random in its join-order selection. In short, a serious limitation of state-of-the-art query optimization lies in its inability to scale to the large numbers of tables that queries for data-warehousing applications commonly target.
An alternative join-enumeration strategy involves considering all join plans rather than just those that are left-deep. Join plans that are not just left-deep or right-deep are “bushy” join plans, which FIG. 4 illustrates. A bushy join plan allows any inner (right) relation to be a non-base relation. These trees are often said to “allow composite inners,” in distinction to left-deep plans for which all inners are base tables.
Ono and Lohman (“Measuring the Complexity of Join Enumeration in Query Optimization,” Proceedings of the 16th VLDB Conference, Brisbane, Australia, Aug. 13-16, 1990) showed that pruning non-optimal plans dynamically by using a strategy similar to dynamic programming reduces the space of bushy plans that needs to be considered from O(n!*Cn) to O(3n). For large numbers of relations, this space is very large; the number of possible bushy join plans for seven tables is 665,280, for example. This suggests that any attempt to evaluate such a plan space should employ a very lightweight approach.
Still, such plans are sometimes better than any of the left-deep plans. Such cases can arise when several of the joins are highly selective. Consider a query that joins four large tables A, B, C, and D. Suppose that the join predicates are A.X=B.X, C.Y=D.Y, and B.Z=C.Z, that the first two are very selective (produce small results), and that the third is not. Then a bushy join plan is likely superior to any left-deep plan, because it will properly take advantage of the smaller intermediate results to make the top join very cheap.
Vance and Maier (“Rapid Bush Join-Order Optimization with Cartesian Products,” Bennet Vance and David Maier, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, Jun. 4-6, 1996) considered relaxing the left-deep-join-tree assumption by allowing bushy plans in the search space. They suggested an efficient method for enumerating bushy plans.