1. Technical Field
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for monitoring and managing database queries for improving performance.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
Information stored on a computer system is often organized in a structure called a database. A database is a grouping of related structures called ‘tables,’ which in turn are organized in rows of individual data elements. The rows are often referred to as ‘records,’ and the individual data elements are referred to as ‘fields.’ In this specification generally, therefore, an aggregation of fields is referred to as a ‘data structure’ or a ‘record,’ and an aggregation of records is referred to as a ‘table.’ An aggregation of related tables is called a ‘database.’
A computer system typically operates according to computer program instructions in computer programs. A computer program that supports access to information in a database is typically called a database management system or a ‘DBMS.’ A DBMS is responsible for helping other computer programs access, manipulate, and save information in a database.
A DBMS typically supports access and management tools to aid users, developers, and other programs in accessing information in a database. One such tool is the structured query language, ‘SQL.’ SQL is query language for requesting information from a database. Although there is a standard of the American National Standards Institute (‘ANSI’) for SQL, as a practical matter, most versions of SQL tend to include many extensions. Here is an example of a database query expressed in SQL:                select*from stores, transactions        where stores.location=“Minnesota”        and stores.storeID=transactions.storeID        
This SQL query accesses information in a database by selecting records from two tables of the database, one table named ‘stores’ and another table named ‘transactions.’ The records selected are those having value “Minnesota” in their store location fields and transactions for the stores in Minnesota. In retrieving the data for this SQL query, an SQL engine will first retrieve records from the stores table and then retrieve records from the transaction table. Records that satisfy the query requirements then are merged in a ‘join.’
Database indexes are used to speed up queries. However, users frequently lack the skill to build appropriate indexes, and/or there are too many disparate queries for optimal indexes to be built for all permutations of columns. While there are tools to recommend optimal indexes, those are only known after the fact in many environments where ad hoc queries are put together.
In one example, a database may have a first index on a first column c1 and a second column c2. The first index allows rapid indexing for queries including columns c1 and c2, typically providing an entry into the database table, for example, a record number. Such an index, often called a composite index, is often implemented as a tree, having in this example a node which could branch to only certain values of c1, followed by further nodes and branches to finally select a desired value of c1, followed by nodes and branches which select a desired value of c2 which also has the desired value of c1.
In this same example, a database may also have a second index on a second column c2 and a third column c3. The second index allows rapid indexing for queries including columns c2 and c3, typically providing an entry into the database table, for example, a record number. In this example a node which could branch to only certain values of c2, followed by further nodes and branches to finally select a desired value of c2, followed by nodes and branches which select a desired value of c3 which also has the desired value of c2.
If a query on c1 and c3 is made, there is no index on c1 and c3, making the query execute more slowly. It would be advantageous to have an index over c1 and c3 at the time of the query, but none was previously created and may not be worth creating. In particular, for ad hoc queries, such queries may be unexpected and/or too infrequent to predict.
Thus there remains a demand for an optimizer to run a query quickly even when only suboptimal indexes exist. One method of piecing together information from multiple indexes is bitmapping—however that has serious shortcomings including the startup time to build bitmaps and the fact that bitmaps once built can become stale if the underlying data changes.