1. Field of the Invention
The invention relates to database management systems (DBMS). More particularly, it relates to database utilities for predicting the time required to execute database utility commands.
2. Description of the Related Art
Database administration tasks are frequently executed by running database utility commands or jobs. Often, these utilities can be run only when the database is inaccessible to users. The allowable length of this period of inaccessibility, sometimes referred to as the batch window, has declined substantially as business requirements for longer periods of data availability (such as the goal of 24 by 7 operation) have become stronger. Other database administration utilities do not require that data be inaccessible to users when the utility command is executed, but database administrators wish to schedule such utilities so that their execution causes the least possible impact (due to contention for resources such as CPU, memory, disk access, etc.,) to user transactions on the database.
Thus, a database administrator is faced with the problem of executing as many utility commands or jobs as possible given time and resource constraints. For example, assuming a database administrator is limited to a 4 hour period during which he or she can reorganize and make copies of databases. During that time the administrator must decide which databases to copy and which to reorganize so that all of the copy and reorganization tasks will be completed within the 4 hour period. To make this decision, the database administrator will want to know how long each utility command will take, so that he or she can schedule the utility commands with some confidence that they will complete in the required batch window.
One conventional approach for estimating the amount of time that a particular utility command will take to execute is to break down the utility command into its constituent sub-utility commands (such as reading or writing database records), apply rule-of-thumb formulas based upon the average time needed to complete such sub-utility commands, and compute a time estimate based upon the rule-of-thumb formulas tailored with relevant information about the database. Such an approach is aided by products such as IBM® Corporation's DB2® ESTIMATOR, which if provided with the database administrator's input as to the size of the database, number of columns, number of indexes, etc., will estimate the time required to execute a specific utility command.
One problem with this approach is that it is very labor intensive on the part of the database administrator, since he or she must supply the required parameters to the rule-of-thumb formula such as number of rows in the table, number of columns, number of indexes, length of index keys, etc. It also does not account for the natural variability of a computer system in which the elapsed execution time of an utility command can be affected by other work being performed on the same computer system.
A second problem with this approach is that the resultant estimate might not be accurate enough to ensure that the utility commands will complete in the required time. Not all of the sources of variability are captured by the rule-of-thumb formula technique described above. The formulas themselves are approximations of real conditions, and the exact numbers that the administrator must supply, such as size of the database, may not be known precisely enough.
A second approach is to use a simple measure, such as the total size of the object to be operated on, as a relative measure of elapsed execution time. However, such an approach is actually a very simplified version of the first approach described, above, and although less labor intensive to use, it suffers from many of the same problems identified above.
A more precise method and technique for predicting the time required to execute utility commands on a database is needed. That is, there is a need to accurately capture as many sources of variability as possible based upon real conditions affecting the database and the nature of the database itself, thereby increasing accuracy of the prediction, yet avoiding labor intensive input from a system administrator.