1. Technical Field
This invention generally relates to computer systems, and more specifically relates to database apparatus and methods.
2. Background Art
Database systems have been developed that allow a computer to store a large amount of information in a way that allows a user to search for and retrieve specific information in the database. For example, an insurance company may have a database that includes all of its policy holders and their current account information, including payment history, premium amount, policy number, policy type, exclusions to coverage, etc. A database system allows the insurance company to retrieve the account information for a single policy holder among the thousands and perhaps millions of policy holders in its database.
Retrieval of information from a database is typically done using queries. A query usually specifies conditions that apply to one or more columns of the database, and may specify relatively complex logical operations on multiple columns. The database is searched for records that satisfy the query, and those records are returned as the query result.
One way to improve the performance of a query is to use an index. An index is a data structure that allows more efficiently locating data in a table. A query optimizer may evaluate a query and determine that building an index for a table will help the query run faster. One problem that exists is the proliferation of indexes over time. In large database systems, each table may have dozens of indexes associated with the table that have been generated over time by the query optimizer to run different queries on the table. In the prior art, there is a timestamp in the index that allows determining the last time an index was used. If the index has not been used for some predetermined time threshold, the index may be discarded. Note, however, that an index may provide statistics to the query optimizer that aid in processing a query, without the index being used to process the query. Because the last used timestamp of an index is not updated when an index is used to provide statistics, but is only updated when an index is used to run a query, the last used timestamp does not truly indicate the last time the query optimizer might have used the index to provide statistics. Deleting an index based on the last used timestamp thus may result in deleting an index that the query optimizer uses often to provide statistics in determining how to execute a query.
Another problem with the prior art is the selection of a time threshold for discarding indexes. Is an index too old if it has not been used for a month? Or is three months, or six months a better threshold? The selection of a last used threshold is somewhat arbitrary. In addition, the last used threshold does not indicate how the index was used. For example, an index that includes four columns may be used often in running a query against only one of the four columns. In this case, the index is used often, but is an inefficient index for executing the query. The query optimizer would do better to build a new index for the one column in the query, and discard the old index that includes four columns. However, there is no way to know which portions of an index are used. Without a way to track how components within an index are used, the database industry will continue to suffer from inefficient methods for determining when to build and index and when to discard an index.