A DB2 database on an IBM mainframe server is a relational database management system. In a relational database, data is stored in one or more tables, each containing a specific number of rows. The rows of a table are divided into columns. Rows can be retrieved and manipulated at the column level. The language for DB2 data retrieval and manipulation is SQL (Structured Query Language). Instructions to the database for either retrieval or manipulation are hereinafter referred to as SQL statements or more generally, as queries. The major SQL operations commands are SELECT, INSERT, UPDATE, and DELETE. The performance of SQL statements can be enhanced through the use of indexes in accessing the rows in the DB2 tables more efficiently. An index is an ordered set of pointers to the data in the table i.e., it orders the values of columns in a table. Indexes are stored separately from the data. The decision on whether and how to use an existing index for a given SQL statement that retrieves/manipulates table data is made internally by the IBM DB2 optimizer. This decision is called the access path of an SQL statement. The access path and the performance of an SQL statement are directly related to index design. In large-scale production environments, thousands of tables may exist. Normally, index design is a manual process that includes estimates rather than actual production activity. The manual process of index design in such an environment is both inefficient and slow.
An SQL statement is internally subdivided into one or more query blocks by the IBM DB2 optimizer. Every query block has its own access path with one or more tables and may use one or more indexes, e.g., the query SELECT*FROM TAB1 UNION SELECT*FROM TAB2 consists of 2 query blocks SELECT*FROM TAB1 and SELECT*FROM TAB2. Other examples of query blocks are VIEW or SUBSELECT MATERIALIZATION. The query blocks and their selected access paths of SQL statements can be analyzed using the EXPLAIN interface of DB2.
Statistical data about the tables and indexes are a major influence on the access path of an SQL statement selected using the IBM DB2 optimizer. The DB2 catalog is an internal set of tables of data about a DB2 database including information on tables, columns, indexes, key columns, and SQL statements.
Buffer pools, also known as virtual buffer pools, are areas of virtual storage in which DB2 temporarily stores pages of data from tables and indexes. When an SQL statement requests rows of a table, the DB2 data manager retrieves the pages containing the rows of the table from the DB2 buffer pool manager. The number of pages retrieved for a table or index to satisfy an SQL statement is the “getpages” for the table or index.
The getpages measured in the buffers of the DB2 database server are always the same for an SQL statement and a given set of indexes on the tables referenced by the SQL statement, regardless of the state of the DB2 database server and the machine. In this regard, getpages are different from disk I/Os which may vary for the same SQL statement at different times.
Predicates are located in the WHERE, HAVING and ON clauses of SQL statements and describe attributes of table data. To reduce the table getpage activity, the IBM DB2 optimizer internally uses predicate filter factors to determine which index is best. Each such index can serve one or more columns of a table. A filter factor is a floating value between 0 and 1 that describes the proportion of rows in a table for which the predicate is true. It is supplied by the IBM DB2 optimizer EXPLAIN interface and stays the same regardless of the indexes deployed, i.e., filter factors are static for the predicates of a given query. A predicate with a small filter factor, i.e., close to zero is very effective because it selects only a few rows out of all rows in a table. A predicate with a high filter factor, i.e., close to 1, is very ineffective since it encompasses most rows in the table.
The following list shows the input data to the present invention. The abbreviations on the left hand side will be used herein:    Q SQL statement    T Table    B Query block    P Predicate    FF Filter factor of a predicate    Getp(T,Q) Number of getpages (equates to number of pages read) for table T for all invocations of SQL statement Q (regardless whether Q is a SELECT, UPDATE, INSERT or DELETE statement)    Getp(I,Q) Number of getpages (equates to number of pages read) for index I for all invocations of SQL statement Q (regardless whether Q is a SELECT, UPDATE, INSERT or DELETE statement)    Update(T,Q) Number of pages changed in table T by all invocations of an UPDATE statement Q    Insert(T,Q) Number of pages changed in table T by all invocations of an INSERT statement Q    Delete(T,Q) Number of pages changed in table T by all invocations of a DELETE statement Q    Col(T,Q,B)={C1, . . . , Cn} is the set of columns within table T and SQL statement Q and query block B    Pred(T,Q,B)={P1, . . . , Pn} is the set of predicates in SQL statement Q and query block B with a column in table T
All indexing n, such as Pn, should be understood as a generic placeholder that differs for each table, index and query block.