1. Field of the Invention
The present invention relates to the use of database tables in computers. More particularly, the invention concerns an apparatus, article of manufacture, and process for loading, reorganizing, or recovering a data structure and simultaneously collecting various statistics concerning the data structure. Such data structures may comprise, for example, tables and/or indices.
2. Description of the Related Art
Database applications constitute one of the most popular applications of computers today. Engineers have designed many different systems and programs to store, query, and modify information stored in database tables. Intricate manipulation of table data is performed using a database language such as "SQL". Higher level operations are typically achieved by executing various high level database commands, in a database system language such as the "DB2" product of International Business Machines Corp. Some of these high level operations include "Load", "Image Copy", "Reorg", and "Recover" operations, for example.
Typically, a database management system performs a "Load" operation to compile data from various sources, and ultimately assemble the data into a new or existing table. A "Reorg" operation copies all rows from of an existing table, stores the rows separately, empties the table's contents, and then re-loads the copied rows back into the table in a desired manner. For example, Reorg procedures are useful to eliminate unwanted space in a table, and to change the order of rows in a table to optimize the table. A "Recover" operation serves to restore a table using a backup copy. Used when a table fails for some reason, a Recovery operation reads table pages from a backup copy, and writes the pages to the failed table. A "Recover Index" operation restores an index using data from an indexed table. More particularly, the Recover Index operation replaces a failed index by scanning the indexed table, and extracting all entries from columns designated as "key" columns. The Recover Index stores these entries in the recovered index, along with the location of each entry in the underlying table.
Another useful DB2 function is the "Runstats" function, which reads a stored data structure and collects various statistics concerning the data structure. It reads each row of a data structure, such as a table or index, from a storage location such as a magnetic data storage drive, analyzes the data, and then provides an operator with various facts about the data itself as well as the way the data is stored within the computer system. As an example, these facts may include the number of rows in the data structure ("cardinality"), the estimated number of distinct values possible for each column of the data structure ("column cardinality"), the percentage of rows that are in clustering order ("cluster ratio"), and the percentage of space in the data structure occupied by data ("percent active"). Statistics such as these are stored in a "data dictionary" or other suitable data store.
The statistics from Runstats are useful for a number of different purposes. First, these statistics enable the database system to optimize access to the data to satisfy a query expressed in a database query language such as "SQL". For example, if the results to a query can be obtained either by scanning an entire table or by reading the table through an index, the database management system ("DBMS") might decide how to process the query based on the statistics relating to the size of the table. If the table is small, for example, it may be more efficient to scan the table rather than to traverse the index.
The Runstats operation is also useful to allow database administrators ("DBAs") to access the status of the database. For example, observing a low value for a table's clustering ratio, a DBA may conclude that the table should be reorganized. Reorganization will resequence the rows in the table to restore their optimal order.
Another use of the Runstats function is to take account of the new characteristics of a data structure after a Load, Reorg, Recover Index, or similar operation. These operations substantially change the characteristics of the data itself, or the manner in which the data is stored. Therefore, a DBA will often perform a Runstats operation immediately after a Load, Reorg, or Recover Index operation.
In addition to DB2, a variety of different programming languages and hardware environments employ statistics-gathering functions similar to Runstats. For example, Oracle-based systems use a similar function to gather statistical information about their data structures.
Despite these benefits, statistics-gathering operations (such as Runstats) may be too slow for certain applications or users. The performance time of these operations can be substantial because of the time required to read the entire contents of an existing data structure. The contents of a data structure may be sizable. Furthermore, reading of the data structure may be slow if the data structure is stored on a hardware device with a relatively slow input/output rate. Thus, when an operation such as Runstats is performed immediately after a time-consuming Load, Reorg, or Recover Index operation, the two operations together may simply take too long for some users. Furthermore, the two operations together incur excessive processor and input/output resources since each operation separately signs the data, for its own purposes. Thus, in many cases, known techniques for gathering statistics about data structures in a DBMS are not completely adequate.