1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to an automated system for identifying and dropping marginal database indexes.
2. Description of Related Art
Relational DataBase Management Systems (RDBMS) using a Structured Query Language (SQL) interface are well known in the art for use in data warehouses. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
In an RDBMS, all data is externally structured into tables. A table in a relational database is two dimensional, consisting of rows and columns. Each column has a name, typically describing the type of data held in that column. As new data is added, more rows are inserted into the table. A user query selects some rows of the table by specifying clauses that qualify the rows to be retrieved based on the values in one or more of the columns.
The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator performs functions on one or more tables and produces a new table as a result. The power of SQL lies on its ability to link information from multiple tables or views together to perform complex sets of procedures with a single statement.
The SQL interface allows users to formulate relational operations on the tables. One of the most common SQL queries executed by the RDBMS is the SELECT statement. In the SQL standard, the SELECT statement generally comprises the format: “SELECT <clause> FROM <clause> WHERE <clause> GROUP BY <clause> HAVING <clause> ORDER BY <clause>.” The clauses generally must follow this sequence, but only the SELECT and FROM clauses are required.
Generally, the result of a SELECT statement is a subset of data retrieved by the RDBMS from one or more existing tables stored in the relational database, wherein the FROM clause identifies the name of the table or tables from which data is being selected. The subset of data is treated as a new table, termed the result table.
Indexes are beneficial to the execution of SQL statements because of their ability to improve query performance. One drawback to indexes is the additional disk space they require and the resources needed to maintain them during SQL updates and bulk load operations. Users must consider these tradeoffs before deciding to initially create a new index.
Over the course of time, the factors involved in this initial decision may change and users may wish to drop one or more existing indexes. Disk space can become scarce and users may wish to drop indexes in an effort to reclaim the space they occupy. Update and load operations may become more frequent over time leading to excessive index maintenance costs. In such cases, users would like to identify the set of existing indexes (if any) that can be dropped without significantly impacting query performance. Essentially, users are always looking for the minimal set of indexes that can deliver the required query performance.
Query performance is often dependent on physical database design, which is an essential step to implementing a high performance data warehouse. During this process, users make choices regarding the physical characteristics of their relational tables and columns. The process of making these choices manually is difficult and mistaken prone for even experienced users. For this reason, vendors often provide tools or “wizards” that examine a user's workload and make index recommendations.
The current focus of such tools is on the creation of new indexes to improve workload performance. Current tools are also capable of identifying unused indexes and recommending their removal. However, one important capability that has been overlooked is the identification and removal of “marginal” indexes whose benefit to query performance is positive but insignificant especially when other factors are considered.
Although a given index may be chosen as an optimal access path, it is not uncommon for alternative access paths to exist that would deliver similar performance. Because the optimization process does not reveal its sub-optimal plans, users may not be aware that a smaller set of defined indexes may be able to deliver similar workload performance with only a minor degradation.
Techniques have been developed for solving this problem. However, all known physical database design processes for indexes concentrate on the selection of new indexes or the removal of completely unused indexes. No known solutions focus on removing marginal indexes or helping users to identify the minimal set of required indexes.
Consequently, there is still a need in the art for additional optimization techniques. The present invention satisfies this need.