Database queries are statements used for directing a database management system (DBMS) to manage (access, store, sort, etc) data stored in a database. An example of a known and popular type of database query is an SQL (Structured Query Language) query that is a type of database query. SQL conforms to a specific or defined database access language commonly used to access relational databases. In a typical use of a relational database, many clients may issue SQL statements to a relational database management system (RDBMS) that in turn compiles the query and executes the compiled query against the relational database. Clients are applications that are written by users to operate on a specific data processing system and interact with the database that is network connected to the specific data processing system. The client may be referred to as the application and that these two terms may be used throughout this document to mean the same entity.
To execute an SQL query statement, the RDBMS must process (that is, compile) the SQL statement issued by the client by parsing the SQL query statement and translating a high level access represented in the statement into an access plan. The access plan is an executable set of instructions that accesses (generally manages) the underlying representation of the objects which are stored in the relational database. This processing of the SQL query statement, referred to as ‘compiling’, may also involve advanced techniques of optimization to choose an optimal access plan. This optimal access plan enables improved execution time when the RDBMS executes the SQL query statement against the relational database, providing improved database performance for the client.
‘Dynamic SQL’ is a type of SQL query statement issued by the client that is not compiled until the client issues a database query. Dynamic SQL query statement is often used for ad hoc database queries. For example, in a database used to track sales of individual products a dynamic SQL query may be invoked by the client in which the dynamic SQL query requests a list of the top ten products sold by sales region from the RDBMS. Depending upon a nature of the dynamic SQL query and on the characteristics of the underlying objects stored in the relational database, the time required to compile this query can be significantly high and thus is a major drawback that reduces database responsiveness as perceived by the client.
Furthermore, if an identical dynamic SQL query is issued later by the client or by another client, the previously generated access plan has to be regenerated each time the RDBMS receives the same dynamic SQL query. Given the potentially high cost of compilation incurred by the RBMDS, this arrangement presents another significant drawback.
To address these issues, relational database management systems, for example DB2 (a product sold by IBM Ltd.), have introduced ‘dynamic SQL caches’ in which previously issued SQL query statements and their compiled access plans are maintained in an ‘in-memory’ cache. In-memory cache is a cache or some sort of memory (persistent or not persistent, such as RAM, hard drive, floppy drive, etc). The use of in-memory cache allows recall of the previously generated and used access plans at a later time. When the client issues a new SQL query statement to the RDBMS, the SQL cache is accessed by the RDBMS. If the RDBMS identifies that the newly received SQL query statement is found in this cache, the entire compilation process can be skipped since the corresponding access plan is available in memory and is ready for execution by the RDBMS, thereby reducing the amount of overall time required to process the new SQL query statement.
In the case of DB2, the SQL cache is accessible by all clients that are accessing (or are ‘connected to’) a particular database (via the RDBMS), so that any SQL statement may be compiled once on behalf of one client by the RDBMS. The cached access plan is then available for any use by other future clients to be executed by the RDBMS without compilation.
In RDBMSs, and in DB2 in particular, an access plan chosen during compilation is not solely a function of the input SQL query statement text. The compilation takes into account a number of client settings (including special register values) that can alter semantics of the SQL query statement or in some other way impact the compiled access plan. This set of client settings is known as the ‘environment’. To ensure that the SQL cache correctly returns the access plan for any client issuing an SQL query, the lookup in the cache is based both on the SQL statement issued by the client and the environment (client settings).
FIG. 1 shows a basic logical structure of an example of a conventional SQL cache 100. A ‘statement entry’ 102 is a control block that comprises the statement text and is the parent control block for one or more ‘environment entries’ 104. The ‘environment entry’ 104 is a control block that comprises the ‘environment’ and is the parent control block for an access plan 106 to be executed and a list 108 of required privileges on objects referenced in the SQL statement. In this diagram and in subsequent diagrams the ‘environment’ is shown as a single field (e.g. E1 104) while representing a set of values comprising the environment.
RDBMSs may also be designed to allow a database administrator to restrict access to various database objects to specific users or groups of users. These restrictions are typically implemented as object privileges that are granted to users or groups of users. In order for an SQL statement to be executed, the RDBMS must first ensure that the user issuing the SQL statement possesses all required privileges on all objects referenced (directly or indirectly) by the statement.
An effective method for handling such object privileges (and one implemented in the DB2 product) is to store the list 108 of required object privileges together with the access plan 106 in the SQL cache 100. Upon finding a ‘matching’ SQL statement in the SQL cache 100 for a given request, the RDBMS processes the list 108 of required privileges that are stored together with the statement 102 and the access plan 106 in the SQL cache 100, The RDBMS then verifies that the user issuing the query possesses all required privileges.
If the user does possess the required privileges, the access plan 106 will be executed. Otherwise, an authorization error is returned to the application indicating that the user does not possess the appropriate privileges. In this manner, the compilation process need not concern itself with whether the user possesses the required privileges. Rather, the privilege check is a post-compilation check of the list of required privileges. Consequently, it is not necessary to repeat a compilation for each different user issuing the same SQL query. Only the required privileges must be checked each time there is an attempt to use a cached access plan.
Using this approach of keeping the privilege checking outside of the compilation process, the SQL and access plan caching is most effective as it can reduce compilations for any application and for any user issuing SQL statements.
Many RDBMSs also support certain object types that support name ‘overloading’ in an SQL statement. That is, a database query may have overloaded object types that the RDBMS may support. That is, there may be more than one object (of the same object type) that share the same object name. In this case, the RDBMS must ‘resolve’ the specific object from the context in which it is referenced in the SQL statement. In the DB2 product, functions, methods, and procedures are such objects, and thus for convenience the generic term ‘routine’ will be used to refer to any ‘overloadable object’ supported by a specific RDBMS.
Extensibility implies making the relational DBMS more object oriented and adding user-defined datatypes (UDTs), user-defined functions (UDFs), user-defined access methods, and usually changing an extensible optimizer. UDTs comprise complex datatypes that may encapsulate complex internal structures and attributes. UDFs define the methods by which applications can create, manipulate, and access the data stored in these new datatypes. Users and applications only call the UDFs and do not need to understand their internal structure. UDFs also support the notion of overloading, which refers to the concept of using the same name for different routines (actually called methods or member functions).
The user can define different UDFs with the same name, but each UDF is applicable to a specific UDT. For example, a function called sum( . . . ) applied to integer datatypes would perform conventional addition, while sum( . . . ) applied to spatial datatypes would perform vector addition. User-defined access methods define the techniques by which the UDTs are accessed and indexed. For instance, through user-defined access methods, R-trees for spatial datatypes can be implemented. An extensible optimizer provides ways to assign costs to UDFs and user-defined access methods, consequently, the DBMS can determine the best way to access the data stored in the UDTs.
A single SQL statement may comprise multiple routine references. When compiling such a statement, the compilation must correctly resolve each of the referenced ‘routines’ in the statement. These routines will be referenced as ‘instances’. As is well known with the software concept of ‘overloading’, when there are many instances of the same name, depending on the context (most typically the routine arguments) some of the instances may not be applicable. Some of the others may be applicable, or ‘qualifying’ matches, but of these there are criteria that establish which is the ‘best match’ of the routine instances. The choice of this ‘best match’ is known as routine resolution.
If an RDBMS does not enforce user ‘privileges’ on routines, then routine resolution does not impact the caching methods of the prior art. If multiple users issue the same SQL statement, with the same environment settings, any routines referenced in the statement will always resolve to the same best match, regardless of which user compiles the query. Therefore, access plan caching can proceed as before.
Some relational database management systems (such as DB2 Version 8) enforce database privileges on routines. A user (or group of users) may or may not have the privilege to access (i.e. execute) the routine. In this case, when compiling an SQL statement that comprises a reference to one or more routines, the routine resolution that occurs during compilation must take into account the user's privileges. This may impact the resolution process in such a way that the ‘best match’ instance of the routine is not chosen in routine resolution because a user does not have the required privileges. In this case, a choice of a different ‘qualifying’ routine instance is made by sequentially checking if the user has the required privilege on the ‘next best match’ until an authorized match is found or no other routines remain (in which case an error will occur).
With this combination of routine-name overloading and routine privileges, the prior art for caching of dynamic SQL is inadequate for caching compiled SQL statements that comprise routine references. This can best be seen by a simple example. Consider two routines both of name F1; these will be referenced as F1.a and F1.b. Consider two users, user1 and user2; user1 is granted the privilege to execute F1.b, but not F1.a; user2 is granted the privilege to execute both F1.a and F1.b. 
The prior art relies on the evaluation of whether a user held required privileges as a validation step that could be performed after compilation (resulting in a successful validation or an error being returned to the application).
In this example, user1 and user2 issue the identical query “SECECT F1(C1) from X.T1”, both with the same environment. In this example it is assumed that (given the data type of the column C1) F1.a would be the best match, and F1.b is a qualifying match, but not the best match. For the sake of the example, assume that both users have all required privileges to select from X.T1.
If user1 issues the query first (when the cache is not yet populated for this statement), then as part of compilation the routine resolution attempts to resolve the F1(C1) reference to F1.a. However, since user1 is not authorized to execute F1.a, the routine resolution resolves to the ‘next best’ match F1.b. In this case, since user1 is authorized to execute F1.b, then F1.b will be chosen as the instance of F1 to use. An access plan will be generated that uses F1.b. 
If this access plan is cached in the SQL cache using prior art methods, then the privilege required to execute F1.b cache is stored together with the access plan. If user2 subsequently issues the statement, the RDBMS will find a match in the cache, i.e. it will find a matching statement text and environment. The RDBMS will then evaluate that user2 is authorized to execute F1.b; consequently, user2 will execute the access plan generated from the compilation performed on behalf of user1. However, this results in an incorrect decision by the caching logic, since user2 is authorized to execute F1.a, which is the ‘best match’. The caching logic should execute an access plan that reflects a routine resolution for F1(C1) that chooses F1.a but chooses F1.b instead.
Conversely if user2 issued the query first (when the cache is not yet populated for this statement), the compilation will drive a routine resolution that will select F1.a, as it is the best match and user2 is authorized to execute it. The access plan will be cached reflecting a selection of F1.a and the list of required privileges will comprise F1.a. If user1 then issues the same statement (with the same environment), a match in the cache will be found based on the statement text and the environment. However, when evaluating whether user1 has the required privileges, the RDBMS will determine that user1 does not have the privilege required to execute F1.a. Consequently, the RDBMS will return an error to the application, even though the request of user1 should not have failed. User1 should have been able to execute successfully by compiling the statement again and resolving to F1.b. 
One way to correct this problem is to ensure that when routine resolution is processing a statement, one user is not permitted to be matched with another user's access plan in the cache. This is effectively the same as including a ‘user identification’ as part of the environment. The RDBMS would ensure that a cache lookup on behalf of user2 would not use an access plan stored on behalf of user1, and vice versa. However, this is clearly not an ideal solution. A database system may comprise many users; most of them might even possess the exact same routine privileges. In fact, in a given installation, access to all routines may be granted to PUBLIC (anyone). In that case, if user identification was comprised as part of the environment a separate compilation would result for each different user query. Each compilation would generate an identical access plan that would be cached under its own different environment. This is not an optimal use of space in the SQL cache and requires each user to incur the cost of the SQL statement compilation.
Accordingly, a solution that addresses, at least in part, this and other shortcomings is desired. The need for such a system has heretofore remained unsatisfied.