Computer database systems manage the storage and retrieval of data in a database. A database comprises a set of tables of data along with information about relations between the tables. Tables represent relations over the data. Each table comprises a set of records of data stored in one or more data fields. The records of a table are also referred to as rows, and the data fields of records in a table are also referred to as columns.
A database server processes data manipulation statements or queries, for example, to retrieve, insert, delete, and update data in a database. Queries are defined by a query language supported by the database system. To enhance performance in processing queries, database servers use indexes to help access data in a database more efficiently. Although all possible indexes for a database can be constructed, the number of possible indexes over a database can be very large and can significantly increase the memory requirements, including storage requirements, for the database. Also, many of the possible indexes may provide no or minimal performance advantage considering the data in the database, the organization of the data in the database, and the usage of the database as represented by a workload of queries executed against the database. Typical database systems therefore use only a subset of possible indexes for accessing databases.
A physical database design tool or database administrator typically selects an index configuration or set of indexes for use by a database system from among possible combinations of indexes for a database. The effectiveness of any index configuration depends, for example, on the data in the database, on the query optimizer accessing the database, and on the workload of queries to be executed against the database. As a database system evolves through database updates and modifications and through changes in workloads of queries, for example, new index configurations should be considered and selected to help maintain a desired level of performance of the database system.
One typical design tool selects index configurations based on semantic information such as uniqueness, reference constraints, and rudimentary statistics such as small versus big tables for example. As this design tool fails to account for query workloads, however, the effectiveness of the selected index configuration is limited.
Another typical design tool uses a knowledge-based or rules-based approach and accounts for query workloads in selecting index configurations. As this design tool uses its own model of index usage by database query optimizers, however, the effectiveness of the selected index configuration is limited by inaccuracies of the model. Furthermore, constructing a model and maintaining a level of consistency between the model and the evolving query optimizer can be very complex and can incur substantial cost in time and memory.
To avoid asynchrony between the design tool and the query optimizer, other design tools use cost estimates determined by the query optimizer to compare the effectiveness of different index configurations. Such design tools are desirable as they account for both query workloads and index usage by query optimizers. Because the space of possible indexes over a database can be very large, however, searching among such indexes can incur substantial cost in time and memory. The problem of selecting an index configuration can be further compounded by the presence of multi-column indexes.