1. Field of the Invention
This invention relates to software application development tools and system management tools, and more particularly relates to a system, method and tool for estimating the execution costs of an application designed to access a database, and for estimating the system performance when executing the application.
2. Description of the Related Art
In a relational database, information is kept in logical tables comprised of rows and columns. Underlying this logical structure is the physical structure used to hold the data and the indexes used to access the data.
A utility is used to maintain the data in the database. The REORG utility is used for reorganizing the physical data on the disk. It reorganizes a table space to improve access performance and reorganizes indexes so that they are more efficiently clustered. The BACKUP utility is used for making a copy of the data so that the data can be recovered if it is lost such as after a disaster. The COPY utility creates an image copy of a table space or a data set within the table space. There are two types of image copies. A FULL IMAGE COPY is a copy of all pages in a table space or data set. An INCREMENTAL IMAGE COPY is a copy only of pages that have been modified since the last use of the COPY utility. The LOAD utility loads data into one or more tables in a table space or partition. It may also be used to replace the contents of a single partition or an entire table space. The RUNSTATS utility scans a table space or indexes to gather information about utilization of space and efficiency of indexes.
A Structured Query Language (SQL) statement allows a user (an application or an end user) to access the logical data without regard to physical structure. In data access languages that predated SQL, the data access language included directions to the database manager on how to obtain the data. With SQL statements, the data access language indicates only what data is required, and leaves it to the optimizer function of the database manager to figure out how to find the data.
A transaction is made up of one or more SQL statements and any given SQL statement, within the transaction, may be executed any number of times. In an actual transaction, the SQL statements may be interleaved with application logic, and there may be a specific sequence of execution of the SQL statements. Overhead costs, such as thread create, terminate, authorization checking, and commits, are associated with transactions.
An application is made up of transactions interleaved with application logic. All of the transactions contend for the CPU and the I/O resources.
In specifying a full definition of a table in a relational database, such as in the IBM DB2 relational database, the user and/or application must specify the database name; the table space or storage group defined; the table description in terms of column names, their data types, and other associated information such as data length; as specified in IBM Database 2 Version 2 SQL Reference Release 2, 1989, Chapter 6. "Statements," pages 95-246, as incorporated herein by reference.
In specifying a full definition of a SQL statement, a user and/or application must specify the columns selected; the tables accessed; the join columns and predicates; all other predicates with specific values and how they are combined, such as through boolean operators; the columns in GROUP BY and ORDER BY clauses; as well as any columns updated, as shown in Chapter 5, "Queries" of the above referenced document, as incorporated herein by reference.
In specifying a full definition of a transaction, the fully defined SQL statements are fully defined in a programming language such as Cobol, C++, etc., as shown in Appendix C of IBM Database2 Version 2 Release 1, Application Programming Guide, 1988. The full definition of the transaction is embodied in program source code that includes the fully defined SQL statements as well as the other language statements that make up the program logic and structure.
When a database manager processes an SQL statement on behalf of an application, the database manager will need to use CPU resources and possibly perform I/O operations to return the results of the SQL query back to the requestor. For any given SQL statement, this may involve scanning all the rows of the database tables or using an index to directly access the data. The amount of CPU and I/O necessary will vary depending on various factors. Some of these factors are under the control of the requestor (an application or end user) while others are internal to the database manager and out of the control of the requester.
For applications that are designed to access a database, design decisions affecting the logical structure of the table impact how the SQL statement is coded as well as the performance of the SQL statement. Decisions affecting the physical structure of the table do not affect the SQL statement, but still may affect the performance of the application accessing the data.
Developing an application for accessing a database can be a very expensive process. Before incurring the development cost, it is important to know in the early design phase whether the performance of the finally designed application will meet performance requirements. During the development process, it is important to be able to estimate the performance of alternative designs to determine which design is optimal. Also, after an application is in use, it is important to estimate how modifications to the system may affect its performance.
Methods are known (IBM Database2 Version 2 Administration Guide Volume 1, Release 2, September 1989, pages 2-24 to 2-28, ) for estimating the amount of disk space required to hold a table and indexes based upon the number of rows in the table and the type of data.
Methods are known for estimating the CPU and I/O time for SQL statements. ("DB2 Cost Formula," Shibamiya, A., Yeung, M. Y., IBM Technical Disclosure Bulletin, Volume 34, Number 12, May 1992, pages 389-394).
Tools are known to estimate the SQL statement cost. However, these estimates are based on user inputs regarding database internals that the user is unlikely to have any real knowledge of, such as the number of columns used in an internal sort.
Tools are known to estimate the database cost of an existing application which was designed using a specific design methodology. Unless an application has been designed using this methodology, no estimation is possible.
Tools are known to estimate the cost based upon the costs of other, possibly similar, applications that have been measured. This type of tool may not take into account that the database manager may execute two similar SQL statements in different ways. Tools are known that have a graphical user interface (GUI) that allows a user to build complete executable SQL statements and to execute those SQL statements against the database. However, these tools require more input than is necessary for estimating the cost of executing the SQL statement, and are, therefore, more complex.