A database management system (DBMS) is a computer program which stores, retrieves, and deletes data from a database in response to user requests. Requests for information from a database are made in the form of a query (also referred to as a “statement” or “command”), which is a stylized question. A DBMS may process requests from locally executed applications or from one or more remote clients by way of a data network. From a technical standpoint, DBMSs can differ widely. The terms relational, network, flat, and hierarchical all refer to the way a DBMS organizes information internally. The internal organization can affect how quickly and flexibly information can be extracted from a database.
A relational database management system (RDBMS) is a particular type of DBMS that operates on a relational database. An RDBMS stores data in the form of related tables. RDBMSs are powerful because they require few assumptions about how data is related or how data will be extracted from the relational database. As a result, the same database can be viewed in many different ways. An important feature of relational systems is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table. Almost all full-scale database systems are RDBMSs. Small database systems, however, use other designs that provide less flexibility in posing queries.
DBMS queries may be written in various database query languages. For example, a well known database query language that is supported by a wide variety of leading DBMS products, such as DB2® from IBM® and Microsoft® Access, for example, is Structured Query Language (SQL). SQL is typically used in association with relational DBMSs. SQL comprises two primary components, namely, a Data Definition Language (DDL) and a Data Manipulation Language (DML). The DDL comprises statements for creating database tables and for creating indices which facilitate access to those tables, for example. The DML, on the other hand, comprises statements for creating, inserting, selecting and deleting rows in tables. In the case of SQL, the present description focuses on the DML.
Most DBMSs have a statement cache that is used to optimize the processing of database query language statements (which, in the case of SQL, make up the DML). As known by those skilled in the art, a statement cache stores the N most recently executed database query language statements and the N “access plans” corresponding to these statements (where N is an integer). An access plan describes the operations that must be performed by the DBMS to effect a particular statement (e.g. open file X; scan the file for search pattern Y, etc.) and may be proprietary. A query optimizer (or simply “optimizer”) reads a new database query language statement and subsequently generates an access plan for that statement. Both the statement and its corresponding access plan are then stored in the statement cache. Typically, the query optimizer is implemented as a software module which is operationally integrated with the DBMS. In the case of an SQL-compatible DBMS (i.e. a DBMS capable of processing SQL statements), the statement cache may be referred to as an “SQL statement cache”.
During database query language statement optimization, incoming statements are typically compared to previously received statements stored in the statement cache. When an incoming statement is determined to be functionally equivalent to a stored statement, the DBMS may load and execute the previously-computed access plan associated with the stored statement rather than computing an access plan afresh. This reuse of access plans tends to improve DBMS efficiency because the amount of processing that is required to identify and load an existing access plan is typically significantly less than the processing required to compute an access plan afresh.
To determine whether an incoming database query language statement is functionally equivalent to a stored statement, known DBMSs typically perform a textual comparison. That is, a new statement is only considered to be functionally equivalent to a previously received statement if it is textually identical to that statement (disregarding insignificant differences such as different spacing). Accordingly, when an incoming database query language statement differs from a previous statement by even one significant character, it will be assumed that the new statement is not functionally equivalent to the previous statement, and a fresh access plan will be computed. Potential efficiency gains may go unrealized as a result.
One situation in which this problem arises is in the processing of database query language statements containing literals. A literal is a hard coded value, such as “abc” or 8.0, in a statement (in the case of SQL, these are referred to as SQL literals). If a new database query language statement differing from a previous statement only in the value of one or more literals is received, the cached access plan associated with the previous statement will not be reused despite the fact that it may be functionally similar to the access plan that will be computed anew. Disadvantageously, potential DBMS efficiency gains may not be realized in this situation.
Accordingly, a solution that addresses, at least in part, these and other shortcomings is desirable.