The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware, such as semiconductors and circuit boards, and software, also known as computer programs.
Fundamentally, computer systems are used for the storage, manipulation, and analysis of data, which may be anything from complicated financial information to simple baking recipes. It is no surprise, then, that the overall value or worth of a computer system depends largely upon how well the computer system stores, manipulates, and analyzes data. One mechanism for managing data is called a database management system (DBMS), which may also be called a database system or simply a database.
Many different types of databases are known, but the most common is usually called a relational database (RDB), which organizes data in tables that have rows, which represent individual entries or records in the database, and columns, which define what is stored in each entry or record. Each table has a unique name within the database and each column has a unique name within the particular table. The database also has an index, which is a data structure that informs the database management system of the location of a certain row in a table given an indexed column value, analogous to a book index informing the reader on which page a given word appears.
To be useful, the data stored in databases must be capable of being retrieved in an efficient manner. The most common way to retrieve data from a database is through statements called database queries, which may originate from user interfaces, application programs, or remote systems, such as clients or peers. A query is an expression evaluated by the database management system. As one might imagine, queries range from being very simple to very complex. Although the query requires the return of a particular data set in response, the method of query execution is typically not specified by the query. Thus, after the database management system receives a query, the database management system interprets the query and determines what internal steps are necessary to satisfy the query. These internal steps may include an identification of the table or tables specified in the query, the row or rows selected in the query, and other information such as whether to use an existing index, whether to build a temporary index, whether to use a temporary file to execute a sort, and/or the order in which the tables are to be joined together to satisfy the query.
When taken together, these internal steps are referred to as an execution plan, an access plan, or just a plan. The access plan is typically created by a software component of the database management system that is often called a query optimizer. The query optimizer may be part of the database management system or separate from, but in communication with, the database management system. When a query optimizer creates an access plan for a given query, the access plan is often saved by the database management system in the program object, e.g., the application program, that requested the query. The access plan may also be saved in an SQL (Structured Query Language) package or an access plan cache. Then, when the user or program object repeats the query, which is a common occurrence, the database management system can find and reutilize the associated saved access plan instead of undergoing the expensive and time-consuming process of recreating the access plan. Thus, reusing access plans increases the performance of queries when performed by the database management system.
Many different access plans may be created for any one query, each of which returns the required data set, yet the different access plans may provide widely different performance. Thus, especially for large databases, the access plan selected by the database management system needs to provide the required data at a reasonable cost in terms of time and hardware resources. Hence, the query optimizer often creates multiple prospective access plans and then chooses the best, or least expensive one, to execute.
Because of the performance sensitivity of queries and access plans, the database management system must provide a process to track or capture the database activity that is taking place in the system, in order to allow for later analysis and remedy of performance problems. Such processes are known as “monitors.” Monitored data can include, but are not limited to, information related to execution of the queries against the database. All of the information captured via these monitors may be stored in either a log file or in another storage medium that allows for easy access to the data to perform any appropriate analysis. The results of these monitors may be then analyzed to determine if the system is operating in an optimal manner. Queries that are not making the best use of the system resources can be identified for further analysis or tuning.
Monitors can capture their information in many ways. For example, the monitor may capture information while the query is active (called runtime monitoring) or may perform its capture from a separate process and extract the information about a query from its access plan. Both of these methods require that the system expend resources formulating the information into a form that can be easily extracted and stored. On a system with a large database with frequent and complicated queries, these resources can quickly become excessive and degrade the overall performance of the system. In addition to the aforementioned problem of resources utilized in collecting the monitored information, the sheer volume of information collected also can make analysis difficult and time consuming.
Various techniques have been tried to lesson the amount of information collected, in order to attempt both easing performance degradation and making analysis easier. For example, the amount of stored information can be adjusted, such as not logging duplicate records about an identical query, in order to prevent information about each occurrence of the identical query from being added to the log file. Further, the storage medium of the system itself can be manipulated to optimize access to the log file to take advantage of any inherent strengths of the I/O (Input/Output) system to speed writes into the log file.
Unfortunately, even with the aforementioned techniques, customers frequently find themselves reading through a large amount of logged information in an attempt to determine the source of performance degradation of a query. For example, customers commonly change a setting or configuration of their database, which is then forgotten, but which results in performance degradation. The customers may then experience great difficulty in attempting to read the logged information to determine the source of the problem.
Thus, there is a need for a technique configured to reduce the overhead associated with maintaining log information and extracting log information regarding queries in a database environment.