The collection of statistics regarding the status of the data within the data sets is an important task within databases. The statistics provide insight into the internal workings of the database. The statistics about the data sets may be retrieved using the RUNSTATS utility within DB2.
The generation of access paths to access data within the database using Structured Query Language (SQL) statements is also an important task within databases. Therefore, the database DB2 provides the commands BIND and REBIND, which allow generating access paths to data. BIND and REBIND use information about the structure and statistics of the data sets.
Reorganization of the database, in particular of data sets, is crucial to obtain good access time. During operation of the database, the database system relocates data within the data sets in response to internal processing controls in order to guarantee faster access to the data. Relocation of data occurs preferably in accordance with logical criteria so that data that belongs together is also physically stored together. However, after longer periods of operation, a good organization of the data is no longer guaranteed because of a multitude of relocation operations. Because of the multitude of relocation operations due to standard operation, a state of disorganization is created on the database that prevents fast access times to the data. In order to be able to guarantee fast access times to the data, the data must be reorganized once a certain degree of disorganization has been reached. The database may provide a utility for this purpose, too. For DB2 this utility is called REORG.
In DB2, the aforementioned utilities are REORG, and RUNSTATS. Other utilities are also available for maintaining data sets. To calculate access paths, DB2 provides the commands BIND and REBIND.
In addition, the statistics determined by RUNSTATS are used by the DB2 utility OPTIMIZER, which may be used to provide improved indexes to access the data.
It is the goal of database users to have databases available 24 hours a day, 7 days a week. A 24/7 level of database availability in mainframes means constant good response times (for example, by real time reorganization and by real time statistics collection for constant optimization of data access), constant avoidance of errors (by real time reactions to critical situations), rapid recovery times in the event of faults (for example, by real time back up operations), as well as only minor interference of applications due to maintenance work. The prerequisite hereto, however, is that monitoring and maintaining of the database must take place virtually in real time. In addition, contention and look-ups due to REORG and RUNSTATS operations should be kept at a minimum.
To maintain the database, it is known to provide access paths using the commands and utilities REORG plus RUNSTATS plus REBIND in combination. The drawback however is that with data sets which have only changed little since the last REBIND, the resulting access paths do not change in most cases, i.e., they do not provide faster access to the data in most cases. It has been found that over 90% of all RUNSTATS results are worthless, as no gain in access time may be achieved.
It is also known to obtain information about the level of disorganization within data sets running RUNSTATS on a regular basis. As running RUNSTATS needs lots of processing capabilities, it should be avoided to run unnecessary RUNSTATS. In addition, running RUNSTATS invalidates existing SQL in Dynamic Statement Caches, which have been established by DB2 during executing dynamic SQL. The Dynamic Statement Cache stores access path information to data for dynamic SQL when it is executed the first time. In case the same SQL is executed again, the already existing access path may be used. When running RUNSTATS, the information within the Dynamic Statement Cache may become invalid, as the stored access paths may become invalid. Therefore, unnecessary RUNSTATS should also be avoided.
According to developments in the DB2 world, a Real-Time Statistics (RTS) utility has been introduced to DB2. RTS gathers information about all objects in a DB2 system, but RTS are not a replacement for RUNSTATS. RUNSTATS gathers information and populates the DB2 Catalog, providing information for the DB2 Optimizer and REORG. This affects all applications as the Optimizer determines access paths for all SQL processing. RTS populates tables created for RTS that are not used by the DB2 Optimizer. RTS are not used as anything more than utility trigger values.
RTS were introduced by IBM to try to solve utility problems. Using RTS to determine when to run a utility such as a REORG, COPY or RUNSTATS, can eliminate unnecessary utility runs and application downtime. Once activated, RTS gathers statistics constantly for all objects in the DB2 system. This adds almost no processing overhead to DB2.
The statistics of RTS are in current systems evaluated by complicated algorithms to determine when to run RUNSTATS or REORG. These algorithms facilitate information about percentage of changed rows, pages, percentage of growth of tables, number of mass deletes, any occurrence of REORGs or LOADS without RUNSTATS since the last RUNSTATS. However, obtaining this information requires processing time. In addition, this information may not lead to good access paths after RUNSTATS and REBIND. The problem with this information is that the decision whether to run RUNSTATS might not lead to improved access paths after REBIND.
For small tables, the above-mentioned information leads to running RUNSTATS often, as the required threshold values are reached quickly. However, in particular for small tables, only small improvements may be observed, if any.
It has been observed that the majority of RUNSTATS operations do not improve access to data. In contrast, as RUNSTATS and REORG and following REBIND operations cause lock-ups and contention of the data sets and the DB2 Catalog, the effect is that data may be retrieved on average slower than without providing RUNSTATS, REORG and REBIND.