In the prior art, a well-known process that is executed periodically for DB2 installations is a CHECK DATA operation to check the referential integrity of a database. Referential integrity is described in general terms at pages 143 -145 of the textbook "Fundamentals of Database Systems" by Elmasri & Navathe (1989). Generally speaking, this entails ensuring that each "foreign key" maps to exactly one "primary key" and that the primary key is a valid one. As an example, suppose that the following data tables exist in a DB2 database of information about a company:
(a) a Department table containing multiple rows or records, one per department in the company, each record having a Department identifier (e.g., Accounting, Maintenance, Operations, etc.) as well as other information about the department in question. The Department identifier serves as the primary key for the Department table; PA1 (b) an Employee table containing multiple rows or records, each representative of a specific employee identified by a unique employee number (which serves as the primary key for the Employee table). Assume that each row or record in the Employee table also includes a department identifier indicating the department to which the employee in question is assigned.
In the above hypothetical Employee table, each row or record has both (i) a primary key, namely the value in the employee number field, e.g., "12345, " which, within the employee table, is unique to that employee record; and (ii) a foreign key, i.e., the value in the department identifier field, e.g., "Accounting," which refers to the primary key of a specific row or record in the Department table. The foreign key "Accounting" may be associated with more than one employee record, because more than one employee may be assigned to the Accounting department. Nor is the foreign key "Accounting" unique to the Employee table itself, because other tables containing information about the company might refer to the Department table. The foreign keys for a given DB2 table space are defined in the DB2 catalog and can be determined by a routine lookup process.
Some additional nomenclature is introduced here to aid in understanding the terminology used. A principal term used here is that of a "data set." Generally speaking, a data set is a collection of data that is referred to in an operating system environment (e.g., the well-known MVS operating system environment) by a single name, in much the same way as a word processing file might be given a single directory name for easy retrieval of the data in the file (even though the data might phyically be stored in a variety of locations on a disk). Examples of MVS data sets include DB2 indexes and DB2 table spaces. Typically, maintenance operations in the DB2 environment involve three steps: First, reading the data to be reorganized (e.g., from a table space or index), often from a variety of different physical locations identified by a data set name. The step of data reading is typically referred to as an UNLOAD process that involves physically copying the data to some other memory or other storage. The second step is that of sorting or otherwise ordering the data to conform to the desired ordering and performing any other desired processing. Finally, the third step is that of rewriting the sorted data to storage (table space or index) designated with the same data set name. The third step is typically referred to as a RELOAD process.
A conventional CHECK DATA operation is illustrated in general terms in FIG. 1 and on page 22 of IBM's "DB2 Version 2 Release 2 Utilities Guide," Document No. GG24-3390-00, July 1989. Foreign keys from one or more DB2 table spaces 120 are UNLOADed in a first processing phase 105, from the data records themselves and/or from foreign-key indexes if they exist. A work or intermediate data set 110 is created from the output of the first processing phase 105. In a second processing phase 115, the contents of the work data set 110 are methodically cross-checked against one or more primary key index(es) 100 in question, and appropriate diagnostic message(s) 125 are displayed.
A significant disadvantage of the foregoing prior-art approach to the CHECK DATA operation is that the various table-space data records and/or foreign-key indexes are UNLOADed one after another, and subsequently the primary-key indexes are read and compared by "probing" each primary key index separately for each foreign key value. The elapsed time required for the operation thus includes the sum of the elapsed times required for each of these input/output (I/O) operations. This is especially undesirable in installations which attempt to operate 24 hours a day, seven days a week; it may be highly undesirable to disable a data set, in effect, for the length of time required for a multi-phase CHECK DATA operation. Moreover, such an approach plainly requires additional I/O steps, which can have tangible financial costs associated therewith. In some actual systems, storage space may be at a premium and thus the use of storage to create a working or intermediate data set may be undesirable.