SQL/MX has a state of the art compiler that is designed to handle complex DSS queries, in addition to simpler OLTP queries. The SQL/MX compiler has a unique architecture that makes it one of today's most capable and extensible database compilers in the industry. In addition to designing a compiler with a large number of advanced features, the designers of the first MX compiler paid special attention to the extensibility of the product. Among U.S. Patents that pertain to SQL/MX optimizers are: U.S. Pat. Nos. 5,819,255; 5,822,747; 6,205,441; and 6,438,741, which are commended to the reader for additional background information.
A SQL/MX compiler consists of several phases to generate an efficient execution plan. First, a query is passed to a parser where syntactic checking is performed and an initial query tree is built. Next, a binder performs semantic checks and binds query variables to database objects. This is followed by a normalizer phase where subquery transformation and other unconditional query transformations take place. The normalizer transforms the query into a canonical tree form before passing it to an optimizer to determine the execution strategy.
In this system, the optimizer uses what is known as a Cascades search engine as the platform for the optimization process. The Cascades search engine is described in U.S. Pat. Nos. 5,819,255 and 5,822,747. The Cascades search engine is a multi-pass, rule-based, cost-based optimization engine. The optimization search space is determined by the set of transformation and implementation rules used by the optimizer. Rules are applied recursively to the initial normalized tree transforming it into semantically equivalent query trees. The transitive closure of the rules applications defines the optimization search space. The optimizer output is a single plan with the lowest cost among all traversed plans in the search space, based on the optimizer's cost model.
As will be appreciated by the skilled artisan, the main advantage of this optimizer lies in the fact that it is rule driven. One can change the search space or search algorithm by simply adding, removing, or changing rules. This offers a great deal of extensibility. Adding a new optimization feature could be as easy as adding a new rule. For example, assume we want to add the feature of eliminating unnecessary Group Bys. This can be achieved by adding a new transformation rule that applies to Group Bys and checks for the condition for elimination (grouping on a unique column). If the condition is satisfied, the rule will eliminate the Group By from its output (a MapValueId on top of the child of the Group By). This rule would apply automatically to both the original Group Bys and the Group Bys generated as an output of other rules.
The main weakness of the Cascades optimizer, however, is performance. The Cascades optimizer uses a complete set of rules (commutativity and left-shift rules) to exhaustively enumerate all the possible join orders. Although the Cascades approach uses the principle of optimality (implemented via Cascades memo, groups and contexts) to significantly reduce the complexity of the exhaustive search algorithm, the complexity remains exponential even when the search space is limited to zigzag and left linear trees. The explosion of the explored search space evidently manifests itself as a compile-time explosion.
The initial optimizer design relied on cost-based pruning and lower bound limit as the potential mechanism to control the search space (this is the “bound” in “branch and bound”). The goal is to use a cost limit, based on the cost of the cheapest plan computed so far, to prune parts of the search space that have a lower bound above the cost limit. Although the technique was helpful in reducing compile time, the pruning rate is far less than what is required to control the exponentially increasing search space.
Compiling a complex query within a short period of time is, by itself, not the challenge. The real challenge is to compile it within a reasonable period of time, yet produce a plan with quality comparable to that generated by the expensive exhaustive search.
Reducing compile time and improving plan quality are the two ever-competing goals for any SQL compiler. More often than not an attempt to improve one of the two will have a negative effect on the other. Hence, a discussion of compiler performance is only relevant in the context of plan quality.
Accordingly, this invention arose out of concerns associated with providing improved compiler performance and plan quality.