A well known database software program is DATABASE 2 (DB2) database software distributed by IBM Corporation. As is known in the art, DB2 operates as a subsystem in a computer system operating under the IBM MVS operating system software. In a DB2 environment, user data resides in DB2 tables which are in tablespaces. A tablespace is, for example, a portion of storage space in a direct access storage device (DASD) such as a disk drive. As is known in the art, address space and dataspace refer to virtual storage in an IBM MVS operating system, address space being capable of executing instructions while dataspace is a subset of address space and is limited to storing data.
For exemplary purposes, illustrated below is an order_entry table that would be stored in a tablespace. The order_entry table contains columns: customer_number; product_code; order_number; buyer_name; and ship_to_zip.
customer_number product_code order_number buyer_name ship_to_zip 1111111111 0010 1234500001 John Doe 60606 1111111111 0040 1234500002 Jane Doe 70707 3333333333 0020 1234500003 Bill Smith 90909 2222222222 0030 1234500004 Fred Smith 80808
While the above order_entry table shows four rows, the table could have millions of rows for all the orders of a company, for example 4 million rows. The order_entry table also has, for example, three index keys and two foreign keys. An index key is an identifier for a particular row of a table while a foreign key also identifies a row but in addition is used for referential integrity as described below. For example, in the order_entry table, one index key could be based on order_number, another index key based on buyer_name and a third index key based on ship_to_zip.
As is known in the art, an index key for a particular table indicates a row identification (RID) and a selected value for the row (e.g., the index key value). The index key can be used to generate an index for the table which facilitates subsequent searches for particular data in the table. For example, the order_entry table would have three indexes (e.g., one for each index key), each index being stored in an indexspace. Similar to a tablespace, an indexspace is, for example, a designated portion of a DASD. Thus, if a user was looking for rows that contain a particular buyer name in the order_entry table, the database management system could query the buyer index for the table to identify all occurrences of the buyer name without reading the entire table to locate the rows.
As is known in the art, each table in a database may be either a parent table, a child table or both. A child table is related to a parent table via the foreign key value or values contained in columns of the child table. For example, a foreign key value can appear multiple times in a child table (e.g., multiple rows in a child table can have the same foreign key, such as the customer_number and product_code entries in the order-entry table) but each foreign key must be associated with a unique key in a parent table of the child table. Referential integrity ensures that every foreign key value is valid (e.g., has a corresponding primary key in a parent table). Thus, referential integrity (RI) means that a value in the column of a row in the table is valid when this value also exists in an index of another table. A row should not be in a table if it violates a constraint. As the order_entry table illustrated above has two foreign keys, it has for example, a RI constraint on customer_number and product_code. As is known in the art, when a user of a DB2 database management system creates a table, the user also defines the constraints for the table (e.g., the user can define the relational integrity criteria).
Illustrated below is a product table and a customer table (e.g., the parent tables for the foreign keys in the order_entry table).
 Product Table product_code product_description retail_price 00010 laptop pc 1000.00 00020 desktop pc 1100.00 00030 office pc 1200.00 00040 lan pc 3500.00 00050 home pc 999.99
The product table shows five rows, although the table could have thousands of rows for all of the different products of a company. The product table has a unique index on the column product_code, which is illustrated in ascending order. The values in the column product_code are each unique since there is only one product code assigned to each product and thus in this table, a product code would not be included more than once. Accordingly, an index for the product table would include the key (e.g., the stored value in the product_code column) and a RID. The product table index would reside in a DB2 indexspace.
The customer table illustrated below shows four rows, although this table could also have thousands of rows for all of the customers of a company. The customer table has a unique index on the column customer_number, illustrated in ascending order. The values in the column customer_number are each unique since there is only one customer number assigned to each customer name and thus a customer number would not be included in this table more than once. Accordingly, an index for the customer table would include the key (e.g., the value of the column customer_number) and a RID. The customer index would also reside in a DB2 indexspace.
 Customer Table customer_number buyer_name customer_address 1111111111 John Doe State A 2222222222 Fred Smith State B 3333333333 Bill Smith State C 4444444444 Steve Jones State D
As shown by the above tables, all of the rows in the order_entry table are valid because the foreign key values in the column product_code exist in the index of the product table and the values in the column customer_number exist in the index of the customer table.
Conventional database management systems, such as DB2, provide the user with the ability to identify specific conditions that a row must meet before it can be added to a table. These conditions are referred to as "constraints" because they constrain the values that a row may include. Constraints include, for example, check constraints and referential integrity constraints. Check constraints include, for example, qualifying criteria for a particular value, such as a zip code value being in the range of 00000 to 99999.
Constraint checking is required at various times. For example, when a copy is made of a loaded database table or when a database table is recovered after a failure, DB2 will not allow access to the table until constraint enforcement is performed. Constraint checking is generally performed by a check utility, e.g., an utility designed to perform constraint checking. Examples of conventional check utilities are CHECK DATA by International Business Machines of Armonk, N.Y. and CHECKPLUS by BMC Software Co. of Houston, Tex.
FIG. 1A illustrates a prior art method for checking check constraints and referential integrity (RI) constraints. As described below, constraint checking for referential integrity performed by conventional check utilities involves execution of a sort task. A sort task places foreign key values in a collated arrangement to facilitate comparison of the foreign key values with the values in the corresponding parent index.
The operation of a conventional check utility involves reading the tablespace to extract foreign keys, passing the foreign keys to a sort operation for collating and then passing the collated foreign keys back to the check utility which reads the appropriate parent index and checks the foreign key values for validation. Reading a tablespace and a parent index in a MVS mainframe environment, required when a sort task is used for constraint checking, involves I/O operations. Each I/O operation could take as long as 35 ms. In contrast, a cpu in a MVS mainframe environment could execute 35 million to 65 million instructions per second. Therefore, utilization of I/O operations in constraint checking is a significant factor in the time required to complete the checking.
As shown in FIG. 1A, a conventional check utility would be initialized and would read the tablespace to be checked. The reading of the tablespace involves I/O operations and could be accomplished, for example, by a reader subroutine invoked by the check utility. The check utility extracts the foreign key values. The extracted foreign keys would be sorted by the sort utility to place the foreign key values in collated order for comparison with the parent index values. As known in the art, the sort utility operation also performs I/O operations.
Thus, as shown in FIG. 1A, in step 1010 a reader task reads a tablespace (e.g., reads the input file to be sorted) and in step 1020 extracts a foreign key value for each record (e.g., row) in the tablespace. If there is more than one foreign key for the record, then each of the foreign keys is extracted by the reader task. In step 1030, each foreign key value is passed to a sort task (e.g., the address of the record to be sorted is passed from the reader task to the sort utility via a memory to memory operation).
In step 1040, each foreign key value passed to the sort utility is operated upon by the sort function of the sort utility to collate the foreign key values in a conventional manner. For example, the sort operation can place the foreign key values in ascending or descending order. When the reader task reaches the end of the input file and has no more keys to pass, the sort utility completes the sorting operation (e.g., completion of any overhead operations that may be required as the actual collating operation has already been completed). As described in more detail below, the sorted foreign keys are passed to an E35 subroutine.
In step 1050, each collated foreign key value is passed by the E35 subroutine to the check utility for constraint checking. The check utility will, in step 1060, read the parent index associated with the foreign key value to be enforced and then compare the foreign key value against the parent index, as is known in the art.
Another alternative use of conventional sort utilities is to have multiple sort tasks initiated as well as multiple reader tasks to read a row and extract a foreign key and send the extracted key to the appropriate pipe for the multiple sort tasks. The pipe is, for example, the input to the E15 of the appropriate sort task, also as described in more detail below. Each of the multiple sort tasks can be assigned to a portion of the tablespace (e.g., if the tablespace is partitioned). When a block of the pipe is full (e.g., when the maximum number of keys that can be stored in the block is attained), the appropriate E15 is posted by the reader task for transfer of the block of keys.
An advantage of the above approach is less MVS overhead because for each WAIT that the E15 does (e.g., waiting to be posted by the reader task that a block of foreign keys is ready), the MVS operating system has to execute thousands of instructions. Similarly, for each POST by a reader task, the MVS operating system has to execute thousands of MVS instructions to process the POST. By using blocks, instead of performing these thousands of instructions for each row, the WAIT and POST operations only have to be performed for each block, which can each contain hundreds of keys. Each block that is posted to the E15 provides the records to be passed to the sort utility.
FIG. 1B illustrates an exemplary flowchart for the operation of a prior art sort function, such as would be used by a conventional check utility as described in FIG. 1A. In step 1100, a reader task passes a block of foreign keys to an E15 input of the sort utility, for example in a manner known in the art. In step 1110, if there are no foreign keys to pass, then the process continues at point A, described below with regard to FIG. 1D. If there are foreign keys in the block to pass to the sort operation, then in step 1120 the E15 passes a foreign key to the sort operation.
In step 1130, the foreign key is compared against the content of a current array, the current array containing the foreign keys that have already been received and collated. The initial current array would be empty. The sort utility includes, for example, temporary storage (e.g., a buffer) to store the current array of sorted foreign keys. The sort utility may process thousands of arrays storing millions of foreign keys. In step 1140, the proper sequence for placement of the foreign key in the current array is determined (e.g., comparing the foreign key against the previously received foreign keys determines where to place the foreign key in the current array). The current array is managed in step 1150 to prepare the array for insertion of the foreign key into the proper location (e.g., to manage the pointers in the array for the insertion of the foreign key). The foreign key is inserted into the current array in step 1160.
In step 1170 it is determined if the current array is full. If the array is full, in step 1180 the current array is written to an intermediate work file and the next array is started. For example, the current array is stored in an intermediate work file in DASD, as is each subsequent array when it becomes full. In step 1190, a string is built for each array stored in an intermediate file so that each array can be located and read at a later time. After step 1190, or if the array is not full at step 1170, in step 1200 the process returns to step 1110 to process the next foreign key. At point A in FIG. 1D, if there are no more foreign keys to pass in step 1110, then in step 1210 the intermediate file is read to locate each array that has been stored. In step 1230, each array in the intermediate file is merged into a single string to arrange all of the sorted foreign keys from each array into a single collated sequence. In step 1230, each collated foreign key is passed to the E35 subroutine for constraint checking, for example as described in FIG. 1A.
Many of the steps identified in FIGS. 1B-1D require I/O operations. For example, managing the current array, placing the foreign key values in the proper sequence in the current array, writing the current array into an intermediate file, reading the intermediate file and merging the arrays stored in the intermediate file all require I/O operations, usually to process millions of foreign keys used in a conventional relational database system. Performing such numerous I/Os, as required when sorting is done, adds significant processing time to the constraint checking operation. According to an exemplary embodiment of the present invention, the above-described sorting operation is eliminated when constraint checking is performed.
Thus, the prior art approach to constraint checking reads each row of the tablespace, extracts the foreign keys involved in the referential integrity constraint check, sorts the extracted key values, and compares the sorted foreign keys against the parent index. In addition, to the extent multiple database tables are subject to constraint checking and some of the tables have the same parent index, the parent index is read each time constraint checking is performed. Moreover, conventional constraint checking always perform a sort operation and provides no mechanism for determining if the use of a sort utility provides the optimal method for performing constraint checking. The use of a sort utility involves numerous I/O operations that can significantly slow down the constraint checking process. Accordingly, eliminating the sort operation in constraint checking can significantly reduce the time required to perform constraint checking.