Information systems, and in particular database systems, typically organize and manage data in a manner that enhances the usefulness of the data. Indexes are valuable tools for reducing the amount of time required to retrieve data in information systems. In the context of database systems, an index is a mechanism that specifies the location of information in the database. A database index is conceptually similar to an index found at the end of a book, in that both kinds of indexes comprise an ordered list of information accompanied with the location of the information. In database systems, values in one or more columns of a table are stored in an index, which is maintained separately from the actual database table. Queries can then be processed against the index to retrieve the data more quickly than it is retrieved by searching the entire table.
The searching efficiency gained by using an index must be weighed against the overhead incurred by maintaining the index current. The negative costs are to storage and increased time for inserts, updates and delete operations. The positive cost is to queries which can take advantage of the index to minimize the number of rows accessed. Therefore, the net effect of some indexes is to increase efficiency, while the net effect of others is to decrease efficiency. The value of an index is largely determined by how the data is used.
The process of selecting and/or generating a set of indexes for data based upon characteristics of the data itself and/or characteristics of queries processed against the data is referred to as "index tuning." Since data may already have associated indexes, index tuning can involve making no changes to indexes, modifying existing indexes, creating new indexes or even deleting existing indexes that are unused.
An important issue in performing index tuning is determining which data is to be analyzed. Sometimes, index tuning is performed on all data in a database, regardless of which data might best benefit from index tuning. In the context of a database where data is organized in tables, this typically involves evaluating all tables in the database to identify the tables that should be indexed differently than they are currently indexed. This approach does have some drawbacks. First, evaluating every table in a database can consume a large amount of system resources. Second, evaluating every table in a database can take a long time, which is often unacceptable in active database systems. Thus, performing index tuning on all data in a database can interfere with other users and/or tasks being performed on the database system. For example, other processes may not be able to access particular data (e.g. a particular data table) while index tuning is being performed on the particular data.
Another approach for performing index tuning involves an operator, who in the context of a database system is typically referred to as a database administrator (DBA), manually selecting particular data on which index tuning is to be performed. Typically a DBA evaluates statistical information for a database to determine which data might most benefit from index tuning. In other words, which data will provide the greatest increase in performance when indexed differently than it is currently indexed, if at all. The DBA then initiates index tuning on the particular data.
One benefit of this approach is that index tuning is performed only on the data selected by the DBA, which can significantly reduce the amount of system resources and time required to perform index tuning. This can be particularly beneficial in situations where it is desirable to minimize the effects of index tuning on other processes.
Despite the advantages this approach provides over performing index tuning on all data in a database, this approach does have some drawbacks. In particular, a DBA must decide which data is to be processed with index tuning which, for large databases, can be very labor intensive. Moreover, the DBA may not make the best decisions about which data is to be index tuned. For example, the DBA may select data for index tuning that does not provide the most performance benefit (or no benefit at all) or not select other data that would provide a performance benefit. Consequently, although this approach may reduce the amount of system resources required to perform index tuning compared to the approach of evaluating all data in a database, it can be very labor intensive and may not provide optimal indexing because of errors attributable to the DBA's decisions.
Based on the need to perform index tuning and the limitations in the prior approaches for determining which data should be index tuned, an approach for automatically determining data that will benefit the most from index tuning that avoids the limitations in the approaches previously described is highly desirable.