1. Technical Field
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for monitoring and managing database queries for improving performance.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
Information stored on a computer system is often organized in a structure called a database. A database is a grouping of related structures called ‘tables,’ which in turn are organized in rows of individual data elements. The rows are often referred to as ‘records,’ and the individual data elements are referred to as ‘fields.’ In this specification generally, therefore, an aggregation of fields is referred to as a ‘data structure’ or a ‘record,’ and an aggregation of records is referred to as a ‘table.’ An aggregation of related tables is called a ‘database.’
A computer system typically operates according to computer program instructions in computer programs. A computer program that supports access to information in a database is typically called a database management system or a ‘DBMS.’ A DBMS is responsible for helping other computer programs access, manipulate, and save information in a database.
A DBMS typically supports access and management tools to aid users, developers, and other programs in accessing information in a database. One such tool is the structured query language, ‘SQL.’ SQL is query language for requesting information from a database. Although there is a standard of the American National Standards Institute (‘ANSI’) for SQL, as a practical matter, most versions of SQL tend to include many extensions. Here is an example of a database query expressed in SQL:                select*from stores, transactions        where stores.location=“Minnesota”        and stores.storeID=transactions.storeID        
This SQL query accesses information in a database by selecting records from two tables of the database, one table named ‘stores’ and another table named ‘transactions.’ The records selected are those having value “Minnesota” in their store location fields and transactions for the stores in Minnesota. In retrieving the data for this SQL query, an SQL engine will first retrieve records from the stores table and then retrieve records from the transaction table. Records that satisfy the query requirements then are merged in a ‘join.’
In many systems, the SQL queries are parsed, a logical plan created, and at least one, often multiple physical plans created for executing the logical plan to execute the SQL query. The multiple physical plans arrive at the same correct output, but can take greatly varying times to arrive at that output, depending on which plan is selected for execution. The best plan to execute is usually the plan having the lowest/cheapest expected cost, typically selected by the query optimizer.
In database query processing, the algorithms used by the query optimizer to implement the query are based on the ‘best’ plan that the optimizer selects using statistics over the underlying tables and columns. This is called the cost based model and is the defacto standard for databases.
One problem with this mechanism is that the chosen plan is selected based on the lowest expected cost. However, in practice, this selection process sometimes chooses a very inferior plan primarily because the available statistics fail to match reality during this execution. The resulting long running queries can be a major source of user frustration, troubleshooting, and support costs.
Current solutions that attempt to correct plans include two main approaches. In the first approach, performance feedback from the actual execution of the query is used for the next run of the query. While this does have some applications, it does nothing to correct the problem query currently running. The second approach is to embed some self-modifying techniques within the query executable itself, primarily the ability to alter join order execution under limited conditions. This second approach also has some applications. Unfortunately, it introduces optimization and runtime overhead into all queries in order to catch the few that have problems. Also, its solution space is quite limited. Methods having a different, low overhead approach would be most beneficial.
In addition, there are also techniques to detect longer running queries, but they primarily provide only reporting capability and/or potentially the ability to simply kill the query. Correcting the problem is still left to user intervention. Improved methods and systems for handling such problems would be beneficial.