Data in a relational database is organized in related two dimensional tables of columns and rows. A user accesses the data by defining a query request that is constructed in a query language, such as structured query language (SQL). FIG. 1 is a flow diagram 100 showing how a SQL query request 102 is processed by a relational database management system (RDBMS) 104. A SQL query request generally includes a sequence of SQL operations. A SQL operation is the smallest brick of a RDBMS and can be, for example, “reading data from disk” or “joining data from several locations.” The RDBMS generates an execution plan for the SQL query request. The execution plan is a sequence of SQL operations that fulfills the objectives of the SQL query when processed on a database 114. There may be various execution plans that can fulfill the objectives of the SQL query request. Each execution plan, however, may have a different cost. The cost can, for instance, be evaluated in terms of computer processing unit (CPU) time required to execute the SQL query.
Most relational database management systems, such as DB2® from IBM® Corporation, generate statistics data 107. The statistics data contains information about the occurrence of values in columns and is used by a query optimizer 105 to determine an optimal execution plan 106 for the SQL query 102. The optimal execution plan is also a sequence of SQL operations, such as SQL operation i 108, SQL operation j 110, SQL operation k 112, etc.
The optimal execution plan is the execution plan by which the SQL query statement can be processed on the database with the minimum costs. Only statistics data which is up-to-date ensures that the selected execution plan for processing the SQL statement on the database is the optimal execution plan. Since data in a database may frequently change, the statistics data has to be updated regularly in order to reflect the changes to the data.
In existing systems, such as the DB2® database system for z/OS from IBM® Corporation, statistics data are updated by the database system in response to commands inputted by a database administrator. The database administrator may update the data on a regular basis, for example, within a maintenance window.
Generally, when the command to update statistics is issued, all statistics data is updated regardless of whether there has been a change to the underlying data that may impact the used statistics. This brute force method, however, may be inefficient and may require a significant amount of processing power. For this reason, the maintenance windows are usually scheduled for off-peak hours. As a consequence, a database may be accessed for a substantial period of time with execution plans developed from outdated statistics data, which could lead to less efficient access of data.
Other systems, for example, the DB2® universal database (UDB) system for Linux, UNIX, and Windows (LUW), contain a so-called learning optimizer (LEO), which updates the relevant statistics data after the execution of each SQL statement. This ensures that the statistics data is always up-to-date. Continuously updating the statistics data, however, requires substantial system resources. Therefore, the employment of a LEO might decrease database performance considerably.
For the reasons given above, neither manually updating statistics data, nor automatically updating statistics data as provided by a LEO, is ideal for database performance. In the following, patent documents that disclose more sophisticated ways of determining when to update statistics data are discussed briefly.
U.S. Pat. No. 6,366,901 discloses a system for automatically maintaining database statistics. According to the invention, for each table for which statistics are collected, a counter is maintained of the number of updates that are made to the columns for which statistics are being collected. When the counter reaches a modification threshold, the system automatically updates the statistics associated with the table. The disadvantage of this invention is that only the number of updates is counted. It may well be that the data changes, but the optimal execution plan determined from outdated statistics data remains the same. The statistics data may therefore be updated without any need and as a consequence, system resources would be wasted.
U.S. Pat. No. 6,389,430 discloses a system for collecting database object statistics by a database management system. The system selects one or more objects and correspondingly one or more object fields and then creates a base set of statistics data for the selected object fields. Each table in a database contains a log record. Each log record is a record of the changes that were made to its corresponding table, and can be used to rebuild a table in the event of a problem. The system reads active log records and updates the base set of statistics when a change is read in the active log records. The system begins extracting log records when an end-of-file of the active log records is reached, and updates the updated statistics data based on the extracted log records. The disadvantage of this invention is that the decision to update is based upon the log record. Changes made to a database may not necessarily require an update of the statistics data. Hence, triggering a statistics update based on log records may waste system resources.
Accordingly, there is a need for a method and system through which the update of statistics data can be triggered without wasting system resources.