A relational database (RDBMS) consists of a set of tables. SQL queries are posed against the table set to obtain information from the database. SQL is a declarative language, which only specified what data is to be retrieved and does not specify how to retrieve it. The Query Optimizer is the component of a relational database that determines the strategy for how the data requested by the query should be retrieved (this is called the query execution plan). Modern relational databases use a cost-based query optimizer, which determines the possible execution plans, assigns a cost value to each plan and picks the one with the least cost. The cost is measure of the resources, such as CPU, IO and memory required to execute the query using the given plan and hence lower the cost, faster the query would execute.
Access Structures such as Indexes and Materialized views are two key mechanisms for speeding up data retrieval in a relational database. Without such structures, the database must resort to scanning the entire table (known as full-table scan) to retrieve the required set of rows. Hence having efficient access structures is very important for query execution performance. The optimizer looks at all available access structures to determine what the best execution plan is. An index is a structure that allows the database to quickly retrieve rows that satisfy a given condition. An index is built on one of more columns of a table—these columns are known as the index key. There are several different types of indexes, most common being the B*Tree Index and the Bitmap Index. A B*Tree index is primarily useful when the query requests a specific value or range of values of the index key and when there are only a few rows to be retrieved. A Bitmap Index is useful when the query involves several conditions on the index keys, connected by Boolean (AND/OR) operators.
A materialized view (MV) is used to pre-compute and store the results of a query into the database and can be used to answer the same or similar queries. MVs improve query performance because they save redundant computation and also because significantly less data may be needed to retrieve the result. Since a materialized view stores results in a database table, indexes can also be created on a materialized view to further speed up access to its data.
For a given table, several different indexes can be created in order to improve query performance. For instance for a table T with two columns a and b, the following indexes could exist: (a), (b), (a,b), (b,a). Note that due to the nature of an index structure, the ordering of columns in the index key is significant. As the number of columns increases, so do the choices for indexes to build. Indexes occupy storage and take up resources to build. Further they should be maintained (updated) when the data in the table changes. So it is not feasible to create all possible (or even a large number of) indexes in a database.
The basic problem that conventional methods are unable to solve is determining the best set of indexes to create to speed up the execution of a set of SQL queries (known as SQL workload), within a given storage limitation. Because indexes consume resources of storage and computational cost for build and maintenance, a few indexes that satisfy a multiplicity of queries should be created.
Given a set of available indexes, the final choice of execution plan of a query is made by the query optimizer using its cost model. There may be several ways in which the database can use an index to access a table. Sometimes, the optimizer cannot use an index because of complex internal rules that get violated. Also, the optimizer itself has a lot of choices in coming up with an execution plan and every RDBMS typically employs heuristics that restrict the available choices. In other words, is very complex, if not impossible for an end-user to determine reliably how the optimizer would use any index and which of the several possible plans would ultimately have the lowest cost.
The broader problem is that conventional tools are unable to determine the ideal combination of access structures, including both materialized views and index for a given SQL workload, or the SQL Access Structure Tuning problem. This problem has typically been attempted to be solved in three ways. One is for an experienced Database Administrator (DBA) creates indexes to tune the SQL statements. This type of tuning is based on knowledge of the general query optimization process and relies of rules-of-thumb and knowledge of application access patterns. This type of analysis has the following problems. The analysis relies heavily on experience of the DBA. A novice DBA performing this task can wreak serious havoc on the system performance by not properly considering all the costs. The number of queries that can be manually tuned is limited.
Another conventional way is for a software tool to sit on top of the RDBMS and use a cost model to weigh different index choices. Example: Oracle Expert. This class of tools has the following problems. The cost model used by the tool tries to imitate the optimizer but cannot really guarantee that the resulting indexes would be used when the query is run. Because the tool is not integrated into the optimizer, it has to incorporate rules for how an optimizer uses indexes. Since these rules can change over time, the tool can soon become out-of-sync and needs to be modified, typically with every new version of the RDBMS software.
Another conventional tool sits on top of the RDBMS and uses some external utility provided by the RDBMS to evaluate various index structures, such as Microsoft Index Tuning Wizard for example. This class of tools has the following problems. The information accessible to this tool is limited to that provided by the external utility. For example, it may not have access to the internal cost estimates of the optimizer. It is not possible for the tool's analysis to be interspersed with the optimization process, since the optimizer is but a black box to this tool. For instance, the optimizer may try various intermediate transformations and we may need to evaluate different index strategies for each of these transformations.