A well known database software program is DATABASE 2 (DB2) database software distributed by IBM Corporation. As is known in the art, DB2 operates as a subsystem in a computer system operating under the IBM MVS operating system software. In a DB2 environment, user data resides in DB2 tables which are in tablespaces. A tablespace is, for example, a portion of storage space in a direct access storage device (DASD) such as a disk drive. For exemplary purposes, illustrated below is an order_entry table that would be stored in a tablespace. The order_entry table contains columns: customer_number; product_code; order_number; buyer_name; and ship_to_zip.
Order_Entry Table customer_number product_code order_number buyer_name ship_to_zip 1111111111 0010 1234500001 John Doe 60606 1111111111 0040 1234500002 Jane Doe 70707 3333333333 0020 1234500003 Bill Smith 90909 2222222222 0030 1234500004 Fred Smith 80808
While the above Order_Entry table shows four rows, the table could have millions of rows for all the orders of a company, for example 4 million rows. The order_entry table also has, for example, three index keys and two foreign keys. An index key is an identifier for a particular row of a table while a foreign key also identifies a row but is also used for referential integrity as described below. For example, in the order_entry table, one index key could be based on Order_Number, another index key based on buyer_name and a third index key based on ship_to_zip. As is known in the art, an index key for a particular table indicates a row identification (RID) and a selected value for the row (e.g., the index key value).
The index key can be used to generate an index for the table which facilitates subsequent searches for particular data in the table. For example, the Order_Entry table would have three indexes (e.g., one for each index key), each index being stored in an indexspace. Similar to a tablespace, an indexspace is, for example, a designated portion of a DASD. Thus, if a user was looking for rows that contain a particular buyer name in the Order Entry_table, the database management system could query the buyer index for the table to identify all occurrences of the buyer name without reading the entire table to locate the rows.
DB2 administrators analyze performance characteristics for application programs that access a database table in an attempt to find the optimum index structure for fast access to the database table. The values to be used as an index must be carefully selected because each index results in overhead for the database system. For example, each transaction in a database table, such as an add or delete, requires that each index for the table also be updated. Thus, it is desirable that the number of indexes for a table be minimized to enhance the performance of application programs. The values to be used as an index for a database table are selected based on, for example, data accessed most frequently by users of the table, generally on-line transaction users. Index keys generally are not based on foreign keys, as foreign keys are used primarily for validation purposes (e.g., constraint enforcement).
As is known in the art, each table in a database may be either a parent table, a child table or both. A child table is related to a parent table via the foreign key value or values contained in columns of the child table. For example, a foreign key value can appear multiple times in a child table (e.g., multiple rows in a child table can have the same foreign key, such as the customer_number and product_code entries in the order_entry table) but each foreign key must be associated with a unique key in a parent table of the child table.
Referential integrity ensures that every foreign key value is valid (e.g., has a corresponding primary key in a parent table). Thus, referential integrity (RI) means that a value in the column of a row in the table is valid when this value also exists in an index of another table. A row should not be in a table if it violates a constraint. As the order_entry table illustrated above has two foreign keys, it has a RI constraint on customer_number and product_code. As is known in the art, when a user of a DB2 database management system creates a table, the user also defines the constraints for the table (e.g., the user can define the relational integrity criteria). Illustrated below are an exemplary product table and an exemplary customer table (e.g., the parent tables for the foreign keys in the order_entry table).
 Product Table product_code product_description retail_price 00010 laptop pc 1000.00 00020 desktop pc 1100.00 00030 office pc 1200.00 00040 lan pc 3500.00 00050 home pc 999.99
The product table show five rows, although the table could have thousands of rows for all of the different products of a company. The product table has, for example, an index based on the column product_code, which values are illustrated in ascending order. The values in the column product_code are each unique since there is only one product code assigned to each product and thus in this table, a product code would not be included more than once. Accordingly, an index for the product table would include the key value (e.g., the stored value in the product_code column) and a RID. The product table index would reside in a DB2 indexspace.
The customer table illustrated below shows four rows, although this table could also have thousands of rows for all of the customers of a company. The customer table has, for example, an index based on the column customer_number, which values are illustrated in ascending order. The values in the column customer_number are each unique since there is only one customer_number assigned to each customer name and thus a customer_number would not be included in this table more than once. Accordingly, an index for the customer table would include the key value (e.g., the value of the column customer_number) and a RID. The customer index would also reside in a DB2 indexspace.
 Customer Table customer_number buyer_name customer_address 1111111111 John Doe State A 2222222222 Fred Smith State B 3333333333 Bill Smith State C 4444444444 Steve Jones State D
As shown by the above tables, all of the rows in the Order_Entry table are valid (e.g., there are no referential integrity constraint violations) because the foreign key values in the column product_code of the Order_Entry table also exist in the product table and the values in the column customer_number of the Order_Entry table also exist in the customer table.
Conventional database management systems, such as DB2, provide the user with the ability to identify specific conditions that a row must meet before it can be added to a table. These conditions are referred to as "constraints" because they constrain the values that a row may include. Constraints include, for example, check constraints and referential integrity constraints. Check constraints include, for example, qualifying criteria for a particular value, such as a zip code value (e.g., the ship_to_zip value in the Order_Entry table) being in the range of 00000 to 99999. As discussed above, referential integrity constraints ensure that a value in a row of a table is valid when the value also exists in an index of another table.
Constraint enforcement can be performed prior to loading of data into a database table or after data has already been loaded into a database table. An example of performing constraint enforcement prior to loading data into a database table is provided in co-pending application Ser. no. 09/058,754 filed Apr. 10, 1998, owned by the Assignee of the present application and which is hereby expressly incorporated by reference. If constraint enforcement is performed after loading data into a database table, for example as part of a recovery operation following a hardware of software failure, the constraint enforcement is generally performed by a CHECK utility, such as CHECK DATA by IBM Corp., CHECK PLUS by BMC Software and FASTCHECK by Platinum technology, inc.
Conventional CHECK utilities ensure that data in the table do not violate any constraints that have been established for the table. Constraints can be established at the time the table is generated. For example, constraints can be defined when the table is originally created in the database system and are stored in the DB2 catalog, which can be subsequently queried by a CHECK utility to identify the constraint information.
To perform constraint enforcement, a conventional CHECK utility would, for example, be initialized and identify any applicable constraints for the table to be checked by reading the DB2 catalog, as is known in the art. The CHECK utility would, for example, then read each row of the database table and check for check constraint violations and/or referential integrity constraint violations.
Constraints may be violated for numerous reasons, not all of which require that the row containing the error be deleted. For example, a user might want to correct the error rather than delete the row. An option with some CHECK utilities, such as CHECK PLUS by BMC Software and FASTCHECK by Platinum technology is that if a constraint violation is identified, and thus a row of the database table contains an error, the CHECK utility will produce a DB2 SQL DELETE statement for each row containing a constraint violation. These SQL DELETE statements will be written out to a sequential file which the customer can then use to selectively delete particular rows that are in violation of the constraint(s). For example, in the Order_Entry table illustrated above, a data entry error in the customer_number would not render the order invalid but rather may reflect an error that can be corrected, thus allowing continued processing of the order (e.g., it may be more desirable to fix the error than delete the row).
Other conventional CHECK utilities, such as the IBM CHECK DATA utility, do not generate SQL DELETE statements but do include an option to delete all of the rows in error. A disadvantage of this approach, however, is that the user has no way of individually selecting the particular error rows to be deleted.
The SQL DELETE statements generated by conventional CHECK utilities are based on the foreign key value that violates the constraint. For example, when the SQL DELETE statement is generated by the CHECK utility, the statement uses the key value causing the constraint violation. For example, if the customer number 99999 was read by a CHECK utility while checking the Order_Entry table illustrated above and the customer number 99999 violated a referential integrity constraint (e.g., it was not in a parent index for the parent customer table), then the CHECK utility would generate a SQL DELETE statement using the foreign key value that violated the constraint (i.e., customer number=99999).
Thus, when the SQL DELETE statement is generated using a key value that is not indexed for the table and is executed, the entire database table must be read for each DELETE statement to determine if the foreign key value identified in the SQL DELETE statement is contained in any row of the database table. As mentioned previously, generally foreign key values are not used as an index for a database table and thus no index values are available for use by the DELETE statements utilizing a foreign key value to facilitate deletion of rows containing errors. Therefore, as each DELETE statement generated by the CHECK utility deletes any row that matches the foreign key that is in error, and if the database table contains millions of rows, each of the millions of rows would have to be read for each executed DELETE statement. If even only two rows contain the key value used in the DELETE statement, all of the rows of the table would have to be read. Thus, significant processing time can be consumed as part of the constraint enforcement process due to the need to read an entire database table multiple times.
When a constraint violation is identified, for example due to a DASD failure where a database table is recovered and the recovery process has induced some errors, a user would probably not want to delete the row in error. Instead, the user would want to fix the row, particularly where the data was valid before the failure. Similarly, a LOAD operation without constraint enforcement could be used to load new rows containing customer orders into an Order Entry table. If any constraint violations are identified in the newly loaded data, the user would want to correct the errors, not delete new orders. Conventional CHECK utilities do not, however, provide a tool to correct the rows in error other than deletion of the rows.
For example, the generation of SQL DELETE statements by conventional CHECK utilities do not provide a mechanism to facilitate correction of selected rows containing a constraint violation. The generation of the DELETE statements only provides the option of deleting the row. Correction of the row requires that the user generate a script (e.g., SQL code) to make any desired corrections for each row in error. For example, to correct a row identified as containing a constraint violation, the user of the database system must develop a corrective action plan. Usually, the user will manually code and test SQL statements to fix the rows in error. If there are, for example, 1,000 rows in error, the user would have to produce 1,000 SQL statements to correct the rows as each row requires its own SQL statement. The generation of numerous SQL statement is often plagued by errors introduced during the correction process. Further, most database management system users have numerous DB2 tables to maintain, for example 1,000 DB2 tables, each table having multiple columns of varying data types that may contain an error, thus further complicating the row correction process. Therefore, there is a need to improve the error correction process for rows containing a constraint violation.
In addition, the mechanism used by conventional CHECK utilities used to identify a row containing a constraint violation, typically the row identification (RID), has limited value to a user. The user would prefer to know the data value in error, for example a customer number or an order number which is a value that is meaningful to the user, instead of a page number and row number that can be used by DB2. In contrast to the limited and DB2-oriented information provided by conventional CHECK utilities, the database user wants as much as information as possible rather than merely the location of the row-the user is more interested in the entire contents of the row including the columns in error.