In a relational database system, individual records of data are stored in tables. Each table includes fields, or columns, of information and individual records are stored as rows, with a data entry in each column. Data may also be null within a column, which indicates no data was stored for that column. For example, in an employment database, there may be a table EMPLOYEES which includes columns such as employee ID, lastname, firstname, address, city, state and so forth. One record, including data in the several columns, would typically be entered in the EMPLOYEE table for each employee. Similarly, other tables may be established in accordance with the logical schema of the database. For example, a table SALARY HISTORY may include columns of salary history information, with each record being identified by an employee ID. Another table, MANAGERS, may include columns identifying information specific to managers and may also include the employee ID field. Relations between the tables would be established by reference to the columns which appear in multiple tables such as employee ID in the above example.
Operations on the data in a database are performed using a database access language. Structured Query Language (SQL) is a standard language for relational database management systems. SQL statements enable a user to perform database management operations, such as table or index creation, as well as to manipulate and extract the data contained in the database. For a given database system, the multiple SQL statements which are successfully executed by the database (garbage statements i.e. those including typographical errors, can be submitted but should not be considered part of the workload) comprise a workload of the system.
A major task in database management systems is in locating specific records within individual tables in response to the queries submitted to the database. Typically, the query statements identify one or more columns which are used as keys in the search process. The key is used to restrict the rows.in the table which the database system must evaluate to determine if they satisfy the query. For example, the EMPLOYEES table described in the above example may be searched for all employees who live in Boston, Mass. To that end, the system may scan through every entry in the EMPLOYEES table, specifically searching the city and state columns, to locate the desired records. In many instances, however, the table may span multiple pages and scanning the entire table may require multiple time consuming input/output (I/O) operations.
To reduce the time required to locate particular records, indexes may be established to map particular column values to records. Indexes provide more direct access to individual records, thus reducing search time, including costly I/O operations, to locate the records. Indexes, however, come with a cost of added storage space and an increase in time required to insert, delete or update records. Whenever a record is inserted or deleted, the system not only processes the record in the table, but also any indexes which include columns of the table. Further, updating a record can require relocation of the corresponding entry in one or more indexes.
An index tuning system looks at the logical schema and workload of a database, including specifics of tables and queries, to establish indexes which provide for efficient operation of the database system. One such index tuning system is presented in U.S. Pat. No. 5,404,510 in which the indexes are identified based on a detailed analysis of the workload with weight given to the importance of specific requests.
Typically, when an index tuning mechanism is implemented on a database, the database is tuned with respect to one or more xe2x80x9ctargetxe2x80x9d tables included within the database. One drawback of such a tuning system is that a workload associated with a given database often includes many statements which are not relevant to index tuning efforts. That is, many statements may reference non-target tables or may not affect data retrieval efforts within the database. As a result, the tuning mechanism wastes resources analyzing statements which are not relevant to the index tuning efforts.
In accordance with a workload reduction mechanism for index tuning, a method and apparatus is provided that selects relevant database query statements from a database workload, standardizes a format of the relevant database query statements and reduces the formatted, relevant database query statements into a representative set of database query statements. These statements can then be provided to an index tuning mechanism. With such an arrangement, the index tuning mechanism is able to tune the system in a shorter period of time and use less storage space while analyzing fewer statements.
In accordance with a further aspect of a workload reduction mechanism for index tunning, a system is provided that includes a database workload source which provides input to a workload filter. The workload filter selects relevant database query statements from the workload source and provides input to a formatting system. The formatting system standardizes a format of the relevant database query statements. The system also includes a hash table that includes a hash of each representative database query statement selected from the relevant database query statements.
Also provided is a computer program product to provide a workload reduction mechanism for index tuning.