1. Field of Invention
The present invention relates in general to the digital data processing field. More particularly, the present invention relates to a mechanism for intelligently managing an open query cursor built from a query access plan (AP) by a query optimizer of a database management system (DBMS).
2. Background Art
In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises at least one central processing unit (CPU) and supporting hardware, such as communications buses and memory, necessary to store, retrieve and transfer information. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which comprise a computer program and direct the operation of the other system components.
The overall speed of a computer system is typically improved by increasing parallelism, and specifically, by employing multiple CPUs (also referred to as processors). The modest cost of individual processors packaged on integrated circuit chips has made multiprocessor systems practical, although such multiple processors add more layers of complexity to a system.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, using software having enhanced function, along with faster hardware.
The overall value or worth of a computer system depends largely upon how well the computer system stores, manipulates, and analyzes data. One mechanism for managing data is called a database management system (DBMS).
At a most basic level, a database stores data as a series of logical tables. Each table is made up of rows and columns. Each table has a unique name within the database and each column has a unique name within the particular table. Different statements called queries allow the user or an application program to obtain data from the database. As one might imagine, queries range from being very simple to very complex.
When a database receives a query, the database interprets the query and determines what internal steps are necessary to satisfy the query. These internal steps may include identification of the table or tables specified in the query, the row or rows selected in the query, and other information such as whether to use an existing index, whether to build a temporary index, whether to use a temporary file to execute a sort, and/or the order in which the tables are to be joined together to satisfy the query. When taken together, these internal steps are typically referred to as an access plan (AP), although they are sometimes referred to as an execution plan.
The access plan is typically created by a software component that is often called a query optimizer. For a single given query, the current state of the art for database performance is as close to optimal as can be achieved. Further database performance gains can only be found in more intelligent reuse techniques among multiple queries.
One reuse technique relates to access plan reuse. Consequently, when a query optimizer creates an access plan for a given query, the access plan is often saved by the database management system in an access plan cache. The access plan may also be saved in an SQL (Structured Query Language) package or in a program object, e.g., the application program that requested the query. Then, when the user or program object repeats the query, the database can reutilize the saved access plan instead of undergoing the time-consuming process of recreating it.
Query access plan reuse is well known in the art. For example, U.S. Pat. No. 7,133,861 B2, entitled “QUERY ACCESS PLAN REUSE”, issued on Nov. 7, 2006 to Day et al., and assigned to IBM Corporation, discloses a method, apparatus and system that determine when to rebuild a query access plan.
Another reuse technique relates to the reuse of an open query cursor, which, as explained below, is the query executable structure built from a query access plan. Open query cursor reuse is a very different technology than query access plan reuse.
Query cursors are associated with multirow queries. Query cursors also can be associated with single row queries. A multirow query is a query that returns more than one row and its execution is handled in two stages. First, the program starts the query, although no data is returned immediately. Then, the program requests the rows of data one at a time.
These operations are performed using a special data object called a “cursor”. A cursor is an executable data structure that represents the state of a query, either single row or multirow, and is built from a query access plan. The following list shows the general sequence of program operations:
1. The program declares the cursor and its associated SELECT statement, which merely allocates storage to hold the cursor.
2. The program opens the cursor, which starts the execution of the associated SELECT statement and detects any errors in it.
3. The program fetches a row of data into host variables and processes it.
4. The program closes the cursor after the last row is fetched.
5. When the cursor is no longer needed, the program frees the cursor to deallocate the resources it uses.
These operations may, for example, be performed with SQL statements named DECLARE, OPEN, FETCH, CLOSE, and FREE. Thus, an open query cursor, which is sometimes referred to as an open data path (ODP), is the object built when a user or application program issues an SQL OPEN statement, and is the live object which returns rows when a user or application program issues an SQL FETCH statement. Although SQL is a common interface, other interfaces such as QUERY (IBM iSeries), DDL, XML, etc. may be used to perform the requisite operations.
In the current state of the art, a query cursor within a particular job/process/thread may be reused if that query is rerun within the same job/process/thread. However, there is currently no means to reuse a query cursor for a different job. Thus, system resources are wasted in building and destroying the same cursors for multiple jobs, as illustrated by the following two problem scenarios.
In a first illustrative problem scenario, one thousand users at a typical call center logon their computer system at 8:00 am. They all run an application with twenty-five common queries. Therefore, at 8:00 am, the computer system must perform “full opens” on twenty-five thousand queries (i.e., 1000×25 queries). At a CPU cost of about ½ second per a full open, a considerable CPU cost spike of 208 CPU minutes occurs at 8:00 am. A similar CPU cost spike may be felt at other times during the day, such as after lunch when all of the users again logon their computer system.
A second illustrative problem scenario occurs, for example, in the context of an online web retail application. Multiple users throughout the day run a handful of the same queries, such as lookup product X. Each of the users must pay a full open hit when they sign on, as the cursors are built, when the query is run, when the query is closed, and when the ODP is destroyed. No users benefit from the similar full opens of other users that occurred previously. Consequently, CPU time is wasted by building the same query cursor over and over for different users.
Therefore, a need exists for an enhanced mechanism for intelligent re-use of an open query cursor.