A function-based index is not just an index on a typical database column. A function-based index is an index on a result of a function. The inputs to the function may come from one or more columns. For example, a function like to_upper(A) takes input from a single column while a function like equality(A,B,C) takes input from three columns. A function-based index computes a value for a function and then indexes off the computed value.
A query may include expressions to simplify query writing. For example, a database may store a text string representing a person's name in different formats. The name may be stored in upper case, lower case, mixed upper and lower, and so on. Rather than a query writer anticipating all the storage formats, a query writer may include an expression like to_upper(A) in their query to match a name like SMITH. However, the column to_upper(A) doesn't really exist. Members of the “column” may only exist during the execution of the query. Thus, data concerning the “column” may be difficult to acquire.
Database administrators constantly seek to improve database performance. One way to improve performance is to tune queries to improve query response time. Query response time may be improved by different techniques including, for example, optimizing queries, creating indexes, creating summaries, creating materialized views, rewriting queries to access summaries and/or materialized views, index tuning, and so on. Appropriate use of indexes can improve access paths to data and thus can improve query performance. For example, an appropriate index can enhance SQL (structured query language) statement performance by reducing the need for full table scans. However, indexes may be costly. Creating an index consumes processor cycles, storing an index consumes memory, and maintaining (e.g., updating) an index consumes processor cycles. Thus, a database administrator (dbadmin) may carefully consider which indexes to create, store, and/or maintain.
Making decisions concerning indexes is complicated by the realities of the database environment. For example, dbadmins may have limited resources (e.g., memory, time, processor cycles) available. Therefore a dbadmin may not be able to create an unlimited or even large number of indexes. By way of illustration, the dbadmin may not have enough memory to store all possible indexes that would improve query performance. Similarly, the dbadmin may not have enough processor cycles to maintain all possible indexes that would improve query performance. Thus, a dbadmin may be forced to select indexes to create and maintain.
Automated tools may help the dbadmin select indexes. Conventionally, an automated tool may generate candidate indexes from which a dbadmin may select. These automated tools may consider constraints including memory consumed, processor cycles consumed, query performance, and so on. The automated tools may make decisions based on statistics gathered concerning columns related to a query. In some cases the statistics may be gathered using dynamic sampling. Thus, less than an entire column may be used to generate statistics upon which an automated tool may make a decision. The decisions may also be based on statistics gathered concerning indexes used to resolve a query. Once again these statistics may be gathered using dynamic sampling and thus less than an entire index may be used to generated statistics.
Some statistics are straightforward to collect. For example, the number of columns in a table, the number of rows in a table, the number of times a value appears in a column, and so on are straightforward to collect. However, other statistics may not be so straightforward. For example, when a query includes an expression (e.g., to_upper( ), equals, in_range(start,end)), then the data upon which the statistics are based may only exist during the execution of the query when the expression is evaluated. These types of queries may be referred to as function-based queries or queries that include expressions. Since function-based queries may depend on columns that do not actually exist, function-based queries may complicate determining which indexes to employ to support a workload.
By way of illustration, a query may include an expression concerning the equality of members in two columns. Statistics about each of the two columns may be available. However, statistics about the virtual column—the equality of corresponding members of the two columns—may not exist. Yet it is these statistics that facilitate determining whether to create and/or maintain an index that supports the query. Similarly, an index may exist for each of the two columns. Once again, statistics may be collected for these existing indexes. Since the column represented by the function result does not exist, it is also likely that an index for the non-existent column does not exist. Thus, it may be difficult to acquire statistics for the non-existent index. This contributed to conventional tools not supporting index tuning for function-based indexes.