1. Identification of Copyright
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
2. The Field of the Invention
The present invention relates generally to computer-implemented database systems. More specifically, the present invention relates to selectively preparing a customized report of query explain data for one or more query statements.
3. The Relevant Technology
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database system which uses relational techniques for storing and retrieving data. Relational databases are organized into tables consisting of rows (tuples) and columns of data. A database typically includes many tables, and each table includes multiple rows and columns. The tables are conventionally stored in direct access storage devices (DASD), such as magnetic or optical disk drives, for semi-permanent storage.
Generally, users communicate with an RDBMS using a Structured Query Language (SQL) interface. The SQL interface allows users to create, manipulate, and query a database by formulating relational operations on the tables, either interactively, in batch files, or embedded in host languages such as C and COBOL. SQL has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
The SQL standard provides that each RDBMS should respond to a particular query in the same way, regardless of the underlying database. However, the method that the RDBMS actually uses to find the requested information in the database is left to the RDBMS. Typically, there is more than one method that can be used by the RDBMS to access the requested data. The RDBMS, therefore, attempts to select the method that minimizes the computer time and resources (i.e. cost) for executing the query.
The RDBMS determines how to execute the SQL statements. The set of steps created by the RDBMS for executing the SQL statements is commonly referred to as the "access path." In other words, the access path is a sequence of operations used by the RDBMS to obtain the data requested by the SQL query. Depending on the access path, an SQL statement might search an entire table space, or, alternatively, it might use an index. The access path is the key to determining how well an SQL statement performs. The description of the access path is stored in a plan table, which typically stores the access path data for a plurality of SQL statements.
In addition to determining the access path, some databases estimate the cost for executing each SQL statement. The estimated costs are typically stored in a statement table (as in the case of DB2.RTM. for OS/390.RTM.) or a similar table within the database. Like the plan table, the statement table stores the estimated statement costs for a plurality of SQL statements.
Databases also typically include statistics for such database objects as table spaces, indexes, tables, and columns. For example, in the case of a table, the statistical data may include the number of pages that contain rows of the table, the number of rows and columns in the table, as well as various other statistical data. The statistics are typically derived from the RDBMS "catalog," which is an object that describes the entire database.
The above-described access path data, statement cost data, and object statistics data (referred to collectively as "query explain data") assist the user in analyzing and improving the performance of SQL statements. For example, a query with a higher-than-average statement cost might alert the user to inefficiencies in the access path. By viewing the access path data, the user can selectively make changes to the query and/or the database, such as by adding an index in order to avoid a table space scan. The object statistics data similarly assists the user by describing the structure and organization of the database.
Unfortunately, an analysis of SQL query performance is currently too time-consuming and complex for the average user. In conventional systems, the user must manually locate the relevant query explain data in a plurality of tables, i.e. the plan table, the statement table, and the catalog, which can be a tedious and time-consuming task.
Moreover, each of the tables typically stores information corresponding to many different statements and objects. Consequently, the tables are often very large, making it difficult to locate the desired data. Likewise, the tables are often cryptic and hard to understand, even for database experts. For example, the plan table typically includes sixty or more columns and hundreds or thousands of rows. The access path data is stored in a tabular format which, although easily understood by the RDBMS, is often too complicated to be effectively analyzed.
Accordingly, what is needed is a system, method, and article of manufacture in which a user can selectively prepare a report including the access path data, statement cost data, and object statistics data for one or more query statements. What is also needed is a system, method, and article of manufacture in which a user can choose among a plurality of query statements to include in the report. What is also needed is system, method, and article of manufacture in which the user can selectively include details of access path steps and database objects relevant to the selected query statements. What is also needed is a system, method, and, article of manufacture for combining the selected query explain data into a single report that is readily understandable to a user.