Today's enterprises are widely deploying commercial relational database systems as back-ends for storing and retrieving large amounts of data. One of the important tasks of a database administrator (DBA) is to ensure that good performance is achieved over time. The ability of a DBA to make informed decisions that impact performance depends heavily on being able to understand the nature of the workload (queries and updates faced by the system). While most commercial relational database systems have tools for logging queries and updates that run against the server, existing aids for summarizing, analyzing, and exchanging the workload within and across organizations are inadequate. In particular, the exchange of workload information is cumbersome due to the lack of a common schema upon which to base the exchange of workload information as well as the complexity of workload information.
The ability to efficiently leverage information about the workload of a relational database can assist a DBA in finding under-performing queries, analyzing resource usage, and evaluating the quality of a given query optimizer. To identify an under-performing query, the DBA can search out queries that take a long time to execute and/or spend a significant amount of time scanning the base tables. Or the DBA can identify queries that are I/O intensive and tune the disk layout of the database to better accommodate the identified queries. In addition, a DBA may be interested in detecting users who execute the most queries or identify databases that are accessed most often. The DBA can evaluate the quality of a query optimizer by comparing actual cost to that predicted by the optimizer. Although the workload contains a wealth of information that is useful to a DBA, the exchange of such information is difficult and due to the complexity of existing database workload analysis techniques, most DBAs are limited to performing preprogrammed or “canned” reports on their databases.
For example, Paradyn is a performance measurement tool for parallel and distributed programs. It is designed to scale to long running programs (hours or days) and large (thousand node) systems. It can provide performance data down to the procedure or statement level. Paradyn supports dynamic instrumentation and uses a structure search methodology to automate finding performance bottlenecks. While Paradyn can provide meaningful information to the interested DBA, the querying model over the gathered information is technically advanced and not susceptible to ad hoc querying by a relatively unskilled user. Other commercially available performance analysis tools provide canned reports but offer little flexibility to an end user. These tools include PreciseSoft for Oracle, Centerfield and BMC for IBM AS/400, and Platinum for SQL server database systems.