1. Field of the Invention
The present invention relates generally to information processing environments and, more particularly, to improved methods for fast cloning of prepared statement objects in a data processing system, such as a database management system (DBMS).
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. Today, one generally finds database systems implemented as one or more PC “client” systems, for instance, connected via a network to one or more server-based database systems (e.g., application server and/or SQL database server). Commercial examples of these “client/server” systems include Powersoft® clients connected to one or more Sybase® Adaptive Server® Enterprise database servers. Both Powersoft® and Sybase® Adaptive Server® Enterprise are available from Sybase, Inc. of Dublin, Calif. The general construction and operation of database management systems, including “client/server” relational database systems, is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Addison Wesley, 2000, the disclosure of which is hereby incorporated by reference.
In recent years, this distributed environment has shifted from a standard two-tier client/server environment to a three-tier client/server architecture. This newer client/server architecture introduces three well-defined and separate processes, each typically running on a different platform. A “first tier” provides the user interface, which runs on the user's computer (i.e., the client). The first tier (or client) for many three-tier systems is accessing the second-tier application server through the Internet, typically using a Web browser, such as Netscape Navigator or Microsoft Internet Explorer. Next, a “second tier” provides the functional modules that actually process data. This middle tier typically runs on a server, often called an “application server”. A “third tier” furnishes a database management system (DBMS) that stores the data required by the middle tier. This tier may run on a second server called the “database server” which communicates with the application server.
A simple, traditional database application typically creates one connection to a database that is used for each session of an application. However, many database applications (e.g., Web-based database applications) may need to open and close a new connection several times during the application's use. In this type of environment, connection pooling is often used for increased efficiency. Connection pooling involves maintaining a pool (“connection pool”) of open database connections and managing connection sharing across different client requests to improve system performance and to reduce the number of idle connections. In response to each connection request, the connection pool first determines if there is an idle connection in the pool. If an idle connection is in the pool, the connection pool returns that connection instead of making a new connection to the database. In a multiple-tier client/server application environment, an application server (or a multi-threaded database client) process frequently maintains this type of connection pool for increased efficiency in accessing the database server.
An application server or a client process may also maintain a set of “prepared statements” that are frequently executed against the database server. A prepared statement is an object that represents a precompiled SQL statement. A prepared statement object can be used to efficiently execute the SQL statement against a database multiple times. In a database client process (e.g., a process on an application server) which keeps a pool of open database connections and frequently executes database prepared statements against a database server, the CPU utilization of the application server and the database server can be high as a result of the repeated creation of prepared statement objects. The usual solution is for the database client process (e.g., on the application server) to retain, for each pooled connection, a private cache of prepared statement objects so that with luck, when a statement is to be executed on a given connection, a suitable prepared statement object can be found in the connection's private cache, and re-used. For example, the prepared statement caching technique is described in U.S. Pat. No. 6,115,703 to Bireley, et al., the disclosure of which is hereby incorporated by reference for all purposes.
A disadvantage of this prepared statement caching approach is that it may utilize a significant amount of memory. For example, if there are M number of pooled connections, and N distinct statements that might be executed, the memory utilization of the above approach is proportional to M×N. Where either the number of pooled connections (M) or the number of distinct statements (N) that might be executed is large (or if both are large), this can result in excessive memory utilization within the database client process. It should be noted that for purposes of the following discussion the database client process may include a multi-threaded client connected directly to the database server as well as the more typical situation involving a middle-tier application server or Web server which maintains a pool of connections to the database server.
One current approach for addressing these memory utilization problems is to define a per-connection prepared statement cache size (L, where L<N), such that at most L prepared statement objects will be retained in each connection's private cache. However, one result of this approach is that some statements are not in the cache and must be re-prepared when they are to be executed. Thus, the primary goal of reducing CPU utilization may not be achieved as effectively as was at first envisaged (particularly if the ratio L/N is too high). This turns the problem into a classical space/time tradeoff.
What is needed is an alternative to prepared statement caching that has reduced memory utilization. Ideally, the solution should simultaneously reduce both CPU and memory utilization within a database client process, as compared with the usual approach of prepared statement caching. The present invention provides a solution for these and other needs.