This invention relates to relational database management systems (RDBMS) and more particularly to a method for caching dynamic SQL statements (and their sections) at different nodes in a multiple node database while maintaining cache and application integrity across all nodes.
Structured Query Language (SQL) is the database access language most commonly used to access relational databases (such as the DB2 product sold by IBM Canada Ltd.) in an open, heterogeneous environment. Although this disclosure refers to the DB2 relational database product sold by IBM, individuals skilled in the art will recognize that the caching of database access statements is applicable to any relational database management system (RDBMS).
Within this specification including the claims, the following terms will be used:
Access Plan An access plan is the method(s) chosen by the SQL Compiler to satisfy an application request as stated in the form of an SQL statement.
Agent A process used by a RDBMS to provide services for an application request.
Node A node is a physical entity (eg. a processor and memory) that is used to process application requests to the RDBMS and contains some or all of the database. A serial version of the RDBMS contains at most 1 node, while a parallel version of the RDBMS can contain 1 or more nodes.
Package A package is associated with an application and contains the information required by the RDBMS for all SQL statements defined in that application. The information in a package consists of a collection of sections and the compilation environment settings (e.g. compile or binding options) used to compile any static SQL statements; some of these same settings are also used as the default environment for any dynamic SQL statements compiled by the application during execution.
Section A section contains all the information required by the RDBMS to execute the chosen access plan for an SQL statement. A section is the compiled version of the access plan chosen by the SQL compiler.
Section Entry A section entry contains information about a specific section as well as the SQL statement corresponding to that section.
There are two basic types of SQL statements, static and dynamic. In using static SQL the user embeds SQL requests for data in an application program. An SQL precompiler removes these statements from the application program and replaces them with function calls whose parameters indicate a specific section entry for the package corresponding to the current source file. The removed SQL statement is then sent to DB2 for compilation. Compiling (also known as preparing) a SQL statement is the process by which the DB2 SQL compiler chooses and builds an access plan to efficiently resolve the SQL statement. The access plan is saved in its executable format, a section, in the system catalogues.
The parsing of the statement and building of the access plan can be relatively long and complicated. Compilation of static SQL improves run time performance by building the access plan before the application is executed.
Dynamic SQL is generally used for ad hoc SQL requests. For example, in a database used to track sales of individual products, a dynamic SQL query may be invoked to list the top ten products sold, by sales region. Depending upon the nature of a dynamic SQL request, the time required to parse it and create an access plan to satisfy the user request can be significant. Further, if the dynamic SQL request is repeated later in the application by the same agent or perhaps by a different agent, a new access plan must be created in each instance. Thus, the creation of an identical access plan may often have to be repeated, thereby impacting performance of the application.
A dynamic SQL request originates on the node running an application and may require data from one or more other nodes (the remote nodes). In such a scenario, the prior art solution is to generate the section to obtain the data from the remote node(s) and send the section to the remote node(s) for execution. The inventors are not aware of any solution proposed to date which considers allowing caching of the same statement across multiple nodes, thus not requiring the section to be executed to be shipped. Maintaining multiple iterations on multiple nodes results in less communications traffic and faster response times for remote parts of a dynamic SQL request due to faster startup times.
In accordance with the present invention there is provided a database system comprising:
a) a plurality of nodes;
b) a plurality of databases stored on the nodes; and
c) a global cache stored on at least one node, the global cache being accessible to a plurality of applications in the database system.
In accordance with the present invention there is also provided a method of maintaining cache and application integrity across a relational database system comprising a plurality of nodes and a plurality of databases shared on the nodes, the method comprising the steps of: (a) providing, for at least some of the databases, a global cache including information to enable agents to access the databases, (b) originating a request through a coordinating node, the coordinating node sending information on the variation to be executed to remote nodes; (c) having each remote node receive the request and check the global cache on the respective remote node for a current variation; and (d) if the a remote node does not have the current variation, the remote node requesting the coordinating node to send the current version of the variation.