1. Field of the Invention
The present invention is related to query optimization with algebraic rules.
2. Description of the Related Art
A database system is one in which data is stored, retrieved, and processed. One type of Database Management System (DBMS) is a Relational Database Management System. Data records in a RDBMS in a computer are maintained in tables, which are a collection of rows (records) all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. Tables in the database are searched using, for example, a Structured Query Language (SQL), which specifies search operations or predicates to perform on columns of tables in the database to qualify rows in the database tables that satisfy the search conditions.
Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
Queries that go against a database are often taken through an optimizer within the DBMS. The optimizer is a compile time component that looks for an optimal access plan for the execution engine that retrieves the data. An access plan includes a set of access paths that are selected by the optimizer to evaluate a particular SQL statement. The access plan specifies the order of operations to resolve the execution plan, the implementation techniques (such as JOIN), and the access path for each table referenced in the statement. For example, there may be an access plan to a specific table in the database, as well as, an access plan of the order and technique in which multiple tables may be joined together to produce the desired query result. The access path represents a technique that is selected by the optimizer for retrieving data from a specific table.
In a RDBMS, the tables may have auxiliary structures, such as indexes, that may be used to access specific rows of data. For example, an access plan might make use of an index to retrieve the required data or the access plan may directly scan the table or a combination of the two. These decisions are typically based on a determination of the cost of the alternatives by estimating the number of rows of interest and consequently the cost of retrieving the appropriate blocks or pages from storage (e.g., disk) as well as the amount of computer processing units (CPU) to be utilized. Often statistics about the data that are pre-computed are used to help with the estimation process.
Another common methodology of influencing the access plan is to internally “rewrite” the query making use of other auxiliary information. Constraints that are defined and enforced might be used to eliminate joins or used to introduce predicates or derive statistics. For more information on this, see, Q. Cheng, J. Gryz, F. Koo, C. Leung, L. Liu, X. Qian and B. Schiefer. “Implementation of Two Semantic Query Optimization Techniques in DB2 UDB,” In Proceedings of the 25th VLDB, pages 396–403, Kobe, Japan, 1991; J. Gryz, B. Schiefer, J. Zheng, and C. Zuzarte. “Discovery and Application of Check Constraints in DB2,” In Proceedings of ICDE, Heidelberg, Germany, 2001; and, P. Godfrey, J. Gryz, C. Zuzarte. “Exploiting Constraint-like Data Characterizations in Query Optimization,” In Proceedings of the 2001 ACM SIGMOD International Conference on the Management of Data, pages 582–592, Santa Barbara, Calif., 2001; each of which is incorporated by reference herein in its entirety.
Pre-computed materialized data might be used to replace a portion of the query. For more information on this; see, M. Zaharioudakis, R. Cochrane, G. Lapis, H. Pirahesh, M. Urata. “Answering Complex SQL Queries Using Automatic Summary Tables,” In Proceeding of the 2000 ACM SIGMOD international conference on the management of data, pages 105–116, Dallas, Tex., 2000.
In P. Godfrey, J. Gryz, C. Zuzarte. “Exploiting Constraint-Like Data Characterizations in Query Optimization,” In Proceedings of the 2001 ACM SIGMOD International Conference on the Management of Data, pages 582–592, Santa Barbara, Calif., 2001; the concept of soft constraints is discussed, including absolute soft constraints and statistical soft constraints. Additionally, the notion to use pre-computed materialized query tables that store exception data is introduced. The materialized query tables that store exception data may also be referred to as “outlier tables” or “outlier MQTs.” Using this materialized query table, newly introduced predicates (“new predicates” or “new range predicates”) are generated to help with the access plan of the underlying table, while the exception or outlier data is picked up from the materialized query table and added to the result (using a UNION ALL approach).
An algebraic rule is a relationship between two columns in a relational database table that may be expressed by algebraic expression. A set of algebraic rules that involve columns from one table may be implemented as integrity constraints or soft constraints to optimize queries. However, an algebraic rule that involves columns from more than one table is not used in such optimizations. Furthermore, statistical soft constraints are not used for rewriting queries.
Although existing query optimization techniques are useful, there is a need in the art for improved query optimization.